Meeting 20120314
Artefactual Systems, Internal Archivematica Dev Mtg, 2012-03-14
Development
- Mike has begun to group micro services - Issue 320
- Joseph started work on selectable AIP storage location
- Mark Jordan is working on DIP upload to CONTENTdm
- Evelyn has been discussing requirements with him
- Joseph is implementing structmap alphabetical ordering
- our alpha sorting requirement should use original (pre-sanitized) filenames, sort on UTF-8 chars, and respect numbers/lower-upper case. PyICU sounds like it might deal with Unicode sorting... http://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python
Deployment
Testing
Documentation
- Courtney is working on the web-based transfer interface: http://archivematica.org/wiki/index.php?title=File_Browser_Requirements#START_TRANSFER
chat log
(10:46:07 AM) epmclellan1: meeting time? I can take notes (10:46:07 AM) ARTi left the room (quit: Read error: Connection reset by peer). (10:46:37 AM) peterVG: back (10:46:43 AM) epmclellan1: we just lost Autsin but we can still start with dev (10:47:21 AM) courtney: are mockups dev? (10:47:21 AM) mcantelon: I've started working on grouping jobs, in transfers, by microservice. (10:47:28 AM) epmclellan1: great! (10:47:43 AM) courtney: super (10:47:43 AM) ARTi [~austin@24.207.112.199] entered the room. (10:47:51 AM) epmclellan1: hi ARTi (10:47:55 AM) epmclellan1: we've just started with dev (10:47:58 AM) courtney: epmclellan1: didn't you start grouping microservices somewhere on the wiki? (10:48:17 AM) epmclellan1: yes, it's linked from the issue I think... (10:48:33 AM) ARTi: yes.. dunno if the internet is unstable here, I havnt seen anything since pool table (10:48:54 AM) epmclellan1: micro-services grouping issue is http://code.google.com/p/archivematica/issues/detail?id=320 (10:49:03 AM) epmclellan1: includes mock-up and list of micro-services (10:49:29 AM) mcantelon: I *think* in the database there are already grouped, so it's just a matter of exposing that in the interface. berwin22 berwin221 (10:49:56 AM) epmclellan1: berwin221 is that correct? ^ (10:49:56 AM) berwin221: yes (10:50:16 AM) berwin221: it's by no means complete, but a start (10:50:17 AM) courtney: how much will the changes we're making in transfer backup impact the microservices grouping (10:50:24 AM) berwin221: and should give mcantelon something to work with (10:50:38 AM) epmclellan1: courtney: not too much, I think (10:50:45 AM) mcantelon: Yeah, it shouldn't be too much longer until I have something to show. (10:50:47 AM) epmclellan1: we're moving whole micro-services, not just individual tasks (10:50:52 AM) courtney: ok (10:51:27 AM) courtney: we should make a firm decision about which microservices are absolutely necessary for transfer backup - i'm writing up requirements today (10:51:33 AM) epmclellan1: I like courtney's start transfer mockup (10:51:40 AM) courtney: : ) (10:51:42 AM) epmclellan1: yes, we can talk about that after the meeting (10:51:44 AM) peterVG: yes, nicely done (10:52:01 AM) courtney: it's going to change significantly today - and i'm adding several more (10:52:05 AM) epmclellan1: getting a lot of new ideas about how archivists can handle everything from accession forward (10:52:22 AM) epmclellan1: no other system will do anything like this (10:52:32 AM) courtney: eliminating the need for archives to do preliminary backup actions (10:52:38 AM) courtney: which are currently haphazard at best (10:52:51 AM) epmclellan1: and allowing them to get a better handle on their backlog (10:52:58 AM) epmclellan1: in terms of understanding what's in it (10:53:20 AM) epmclellan1: berwin221 dev news? (10:53:38 AM) berwin221: dev:Work on sort of structmap: (10:53:38 AM) berwin221: Did the default sort, and it appears to be by the binary representation of letters, so case then alphabetic (10:54:09 AM) epmclellan1: how does it handle numbers? (10:54:16 AM) epmclellan1: image001, image002 etc (10:54:18 AM) peterVG: berwin221 "so case then alphabetic"? (10:54:55 AM) peterVG: berwin221 sorry don't fully understand what the implications are (10:55:20 AM) berwin221: dev:Work on selectable AIP storage location. (10:55:20 AM) berwin221: Making another selection step, to pick the destination, from a specified list in the database. (10:55:20 AM) berwin221: The selection is stored in a variable, as a replacement dic, passed down the chain. (10:56:02 AM) berwin221: numbers get sorted in the alphabetic step (10:56:22 AM) epmclellan1: ok (10:56:29 AM) berwin221: http://www.asciitable.com/ (10:57:12 AM) epmclellan1: so for digitization output where eg one file equals a page, user needs to use naming, numberin and capitalization conventions (10:57:20 AM) epmclellan1: which seems reasonable (10:57:58 AM) peterVG: berwin221 does that mean 'S' will get positioned before 'r' ? (10:58:13 AM) berwin221: yes (10:58:24 AM) epmclellan1: is there a way around that? (10:58:24 AM) peterVG: that's not desirable though is it? (10:58:42 AM) ARTi: notes so far http://archivematica.org/wiki/index.php?title=Meeting_20120314 (10:58:57 AM) epmclellan1: thanks for taking notes, ARTi (10:59:00 AM) ARTi: np (10:59:52 AM) berwin221: is there a way around that? time and money (11:00:02 AM) berwin221: I've only started looking at the issue (11:00:22 AM) peterVG: berwin221 did you talk to Mike about it? (11:00:37 AM) berwin221: no (11:00:44 AM) mcantelon: Not sure on the problem surface, but maybe there's a way to hack in natural sorting? http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort (11:01:06 AM) peterVG: multi-lingual alpha sorting is complex, so good to run by other devs for suggestions (11:01:29 AM) berwin221: it's not multilingual (11:01:31 AM) peterVG: I think we also need to be clearer on the requirement then (11:01:43 AM) berwin221: at that point, we've stripped the unicode (11:01:43 AM) ARTi: mcantelon: cool (11:02:01 AM) mcantelon: Multi-lingual sorting seems like it could be complex (presuming different culturing have different ways of sorting)... (11:02:10 AM) peterVG: so we wouldn't be able to sort any files coming in using Unicode chars? (11:02:21 AM) peterVG: e.g. anything non-ASCII? (11:02:26 AM) epmclellan1: We would need to sort by original name (11:02:30 AM) mjsuhonos: i've used the unicode decimal value as a sort index, but only for the first character or few (11:02:36 AM) epmclellan1: instead of sanitized name (11:02:41 AM) epmclellan1: would that be possible? (11:02:59 AM) mjsuhonos: that will cause sorting to align with the UTF-8 mapping, but don't know if that will be cultural (11:03:25 AM) peterVG: mjsuhonos: does it also put capitalized letters before lower-case or does the Unicode decimal value respect this order? (11:03:50 AM) mjsuhonos: it just follows the unicode planes. IIRC, upper-case characters are all mapped together (11:03:55 AM) mjsuhonos: aabbccAABBCC (11:04:13 AM) peterVG: hmm, so not true natural language sorting (11:04:16 AM) ARTi left the room (quit: Read error: Connection reset by peer). (11:04:20 AM) peterVG: pipedream? (11:04:23 AM) mjsuhonos: no, it's glyph sorting. (11:04:28 AM) mjsuhonos: pipedream for sure. (11:04:46 AM) mjsuhonos: "natural order sorting" requires normalization and maybe even transliteration. blag magic at best (11:04:54 AM) peterVG: okay, let's establish then what is actually possible with existing libraries available to us (11:05:07 AM) epmclellan1: makes me wonder, if the objects are supposed to form eg the pages of a book, whether the user should have some means of ordering them during ingest (11:05:09 AM) peterVG: let's continue in seperate thread, post-meeting? (11:05:12 AM) epmclellan1: ok (11:05:46 AM) epmclellan1: any more dev? (11:06:00 AM) peterVG: berwin221 can you please initiate on archivematica@artefactual.com or public list? (and include mjsuhonos) (11:06:24 AM) berwin221: k (11:06:26 AM) peterVG: thx (11:06:39 AM) peterVG: we've lost Austin again? (11:06:54 AM) epmclellan1: looks like it (11:06:59 AM) mcantelon: PyICU sounds like it might deal with Unicode sorting... http://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python (11:07:01 AM) epmclellan1: I can finish minutes (11:07:01 AM) peterVG: what's the ETA on 12.04 port and multi-processor VM testing (11:07:22 AM) peterVG: that's most urgent task for him now as per last week's dev mtg (11:07:24 AM) Sevein: 12.04 end of April (11:07:38 AM) Sevein: well, the Ubuntu release I meant (11:07:39 AM) peterVG: Sevein: we've started porting to 12.04beta (11:07:46 AM) Sevein: yup, I know (11:07:51 AM) peterVG: just wondering on ETA for completion of our package updates (11:07:55 AM) ARTi [~austin@24.207.112.199] entered the room. (11:08:13 AM) ARTi: bleh nets.. did I miss anything to add to notes? (11:08:15 AM) peterVG: so that we can start multi-processor node testing (11:08:40 AM) epmclellan1: ARTi: I'll finish notes, I'll have the whole chat log (11:08:45 AM) peterVG: ARTi: we were talking about multilingual/UTF8 alpha sorting (11:08:49 AM) ARTi: epmclellan1: cheers (11:08:59 AM) peterVG: then I had a question about status of work on 12.04 porting (11:09:27 AM) ARTi: I havnt looked at it from last week, but its mostly done if I recall (11:09:36 AM) peterVG: as per last week's dev meeting that is your most urgent task now, followed by multiprocesser/node testing once 12.04beta porting is completed (11:09:55 AM) ARTi: yep, on it (11:10:15 AM) peterVG: MarkJ is pretty much handling all of the ContentDM task so you're off hook for that (11:11:04 AM) ARTi: cool (11:11:34 AM) epmclellan1: re contentDM and ordering etc, I've emailed UBC Library to get more details about requirements (11:11:47 AM) epmclellan1: their requirements may be fairly simple (11:13:15 AM) peterVG: okay, but just to reiterate, our alpha sorting requirement should use original (pre-sanitized) filenames, sort on UTF-8 chars, and respect numbers/lower-upper case (11:13:28 AM) epmclellan1: yes (11:13:39 AM) epmclellan1: that's the minimum (11:13:51 AM) peterVG: sounds like mcantelon's link above is good place to start for further investigation into how much of this is possible with existing libraries (11:14:07 AM) epmclellan1: need to know from UBC if they have logical structure requirements beyond alphanumeric sorting (11:14:16 AM) epmclellan1: hopefully not (11:14:26 AM) peterVG: epmclellan1: right, related but two seperate issues (11:14:45 AM) peterVG: can someone pls update the alpha sorting issue with this updated ^ discussion (11:14:45 AM) epmclellan1: well, it will dictate how we structure the structMap in METS (11:14:52 AM) epmclellan1: I can update the issue (11:15:03 AM) peterVG: yes, but that is a seperate issue from getting alpha sorting working (11:15:07 AM) epmclellan1: right (11:15:37 AM) peterVG: okay, that's time eh? (11:15:51 AM) epmclellan1: k (11:19:43 AM) epmclellan1: alpha sorting issue updated: http://code.google.com/p/archivematica/issues/detail?id=937