Revision as of 15:12, 21 September 2011 by Joseph
- Joseph modified the create mets, so it can run on a transfer and create a filesec.
- Joseph did some source cleanup
- Joseph implemented a couple of alternative workflows: send to quarantine/not; generate dip and AIP/just AIP
- Joseph added a script for the dev enviroment, to empty all the watched directories, and included calling it in the database cleanup script
- Austin built elasticsearch ubutnut package
- No testing this week.
 Chat log
(10:36:07 AM) berwin22: who's taking notes? (10:36:24 AM) epmclellan: I did last week, soooo... (10:36:46 AM) berwin22: I guess I can... might be a post meeting thing though (10:37:25 AM) berwin22: I did a bunch of source cleanup, removing old files and such (10:38:08 AM) ARTi: just finished building elasticsearch ubuntu package (10:38:09 AM) berwin22: Modified the create mets, so it can run on a transfer and create a filesec (10:38:20 AM) mjsuhonos: ARTi: ROCK (10:38:25 AM) peterVG: ARTi: nice (10:38:27 AM) mjsuhonos: please send infos (10:38:31 AM) berwin22: nice ARTI (10:38:40 AM) ARTi: https://launchpad.net/~archivematica/+archive/externals-dev/+sourcepub/1941170/+listing-archive-extra (10:39:05 AM) berwin22: modified the MCP to optionally make decisions based on an xml file in the SIP (10:39:21 AM) ARTi: mjsuhonos: (10:39:21 AM) ARTi: add-apt-repository ppa:archivematica/externals-dev (10:39:21 AM) ARTi: aptitude update (10:39:21 AM) ARTi: aptitude install elasticsearch (10:39:22 AM) berwin22: Modified transfer workflow to include a default on (10:39:25 AM) ARTi: just testing now (10:39:27 AM) berwin22: one* (10:39:58 AM) ARTi: oh, and (10:39:58 AM) ARTi: sudo start elasticsearch (10:40:23 AM) berwin22: Implemented a couple of alternative workflows: send to quarantine/not; generate dip and AIP/just AIP (10:40:26 AM) ARTi: dont get too excited though.. just testing now ;) (10:40:32 AM) peterVG: berwin22: nice (10:40:49 AM) berwin22: got client name included in the log file of the daemon (10:40:58 AM) peterVG: ^ our two most likely customizations of default workflow (10:41:17 AM) berwin22: I think the daemon is sometimes coughing on the gearman server not being started yet - ARTi do we have an issue on this? (10:41:38 AM) ARTi: berwin22: I havnt seen that issue (10:41:52 AM) ARTi: I can look/work on it though (10:42:09 AM) berwin22: I added a script for the dev enviroment, to empty all the watched directories, and included calling it in the database cleanup script (10:42:19 AM) peterVG: berwin22: my understanding is that the TRIM sample sent by CoV wasn't an actual export (10:42:22 AM) berwin22: ARTi please and thanks (10:42:31 AM) ARTi: berwin22: do you have a open issue? (10:42:38 AM) peterVG: just copy of export schema? I hounded them yesterday to pull some actual sample exports and send them to us (10:42:38 AM) berwin22: peterVG correct (10:42:53 AM) peterVG: seems to be a lot of miscommunication between CVA and TRIM tech team (10:42:54 AM) ARTi: I can create one otherwise (10:43:18 AM) peterVG: berwin22: so don't bother with that. we can start and continue with Dspace import first (10:43:26 AM) berwin22: agreed (10:43:55 AM) berwin22: I'll need to spend some time this week learning elastic search/dspace stuff (10:44:10 AM) peterVG: k (10:44:16 AM) epmclellan: berwin22: I'll try to have the dspace stuff done asap (10:44:34 AM) berwin22: Sevein did some work on the dashboard (10:44:44 AM) berwin22: cheers (10:44:49 AM) mjsuhonos: peterVG: is it worth having a quick chat or email thread on an elasticsearch primer in the next week? (10:45:37 AM) berwin22: mj I could probably benefit, or if you wanted to send out some links you think we'd benefit from (10:45:50 AM) djjuhasz: mjsuhonos: I think Release 1.2 features need to be priority? (10:46:28 AM) peterVG: mjsuhonos: we want to just send xml coming into Archivematica verbatim to ES for indexing (looking to do that with DSpace METS/MODS xml). So no data mapping on import. Presumably we can run a XML to JSON conversion and feed that to ES (10:47:05 AM) mjsuhonos: djjuhasz: yes for sure. peterVG: ok, I'll just be around for any questions/clarification. (sorry berwin22) (10:47:17 AM) peterVG: mjsuhonos: yes, sit tight on ES right now. we'll start working on it as part of Archivematica and then as part of ArchivesSpace data model analysis/proposal (10:47:24 AM) mjsuhonos: *understood* (10:47:31 AM) peterVG: *sit tight on ES primer* (10:47:37 AM) peterVG: might be worthwhile to do when you are in Van (10:47:44 AM) mjsuhonos: ok (10:48:47 AM) peterVG: mjsuhonos: but would appreciate your feedback/help as Austin and Joseph get it going for Archivematica starting with the index of imported XML (10:49:43 AM) mjsuhonos: got it (10:49:49 AM) peterVG: "Presumably we can run a XML to JSON conversion and feed that to ES" -- since ES takes nested JSON presumably it can just mirror any hierarchies that are in the incoming XML docs? (10:50:09 AM) mjsuhonos: let's start a thread on that. XML to JSON is not 100% lossless (10:50:37 AM) mjsuhonos: but yes, should be doable (10:50:45 AM) peterVG: hmmm. okay, can you start a thread with what you know about that so far (10:51:10 AM) mjsuhonos: would be easier if someone can explain what we have as incoming XML (mets?) and what we want to do. (10:52:37 AM) mjsuhonos: ok, i'll take a stab based on what you've said so far and go from there. (10:55:58 AM) peterVG: we'll have all kinds of xml exported from various source systems. Rather than try to map each one to a core Archivematica data model, we'll just index the values carried in the XML document using the verbatim XML document structure. That could be METS, MODS of varying profiles, other open standards or custom XML schemas from apps like TRIM. epmclellan: can you pls send you a sample of the METS/MODS that's coming out of Dspace 1.7 to mjsuhonos. Then we wan (10:56:24 AM) epmclellan: yes, I can do that (10:56:42 AM) peterVG: the point is that it would be 'schema-free' (10:56:44 AM) epmclellan: there are different types of exports from DSpace, I'm trying to figure out what the best one would be (10:57:02 AM) mjsuhonos: peterVG: ok, sounds good (10:57:12 AM) peterVG: 'the point is that it would be 'schema-free' <- from Archivematica's perspective (10:57:25 AM) mjsuhonos: epmclellan: as yesterday, let me know what i can do (10:57:42 AM) epmclellan: thanks, I've had some evolution in my ideas over the past day or two (10:58:02 AM) epmclellan: mptr isn't the way to go, we can talk about it after the meeting (10:58:03 AM) mjsuhonos: :) (10:58:16 AM) peterVG: this would an index built to support the transfer function. Archivematica will also maintain a seperate ES index for AIP content which will have an Archivematica document model behind it (10:58:46 AM) peterVG: ^ i.e. to search all metadata about the preserved digital objects in archival storage (10:59:40 AM) djjuhasz: are you planning to add document text to the index at some point to, or just metadata? (11:00:12 AM) peterVG: full-text content too (11:00:42 AM) peterVG: mjsuhonos: my thinking is that we create a seperate ES index for each transfer (11:00:58 AM) mjsuhonos: that may make the most sense. we can discuss (11:02:34 AM) peterVG: for each of those we index the native XML metadata, full-text index any text-based files, display security/privacy keywords, run FITS and add XML to ES, generate visualizations from transfer metadata in ES (11:03:16 AM) mjsuhonos: you are full of crazy ideas man. (11:03:57 AM) peterVG: that should be my business card title: crazy idea man (11:05:52 AM) peterVG: okay, mtg over. didn't talk testing (11:06:10 AM) epmclellan: no testing from me, anyway