Meeting 20120314

From Archivematica
Jump to navigation Jump to search

Artefactual Systems, Internal Archivematica Dev Mtg, 2012-03-14

Development[edit]

  • Mike has begun to group micro services - Issue 320
  • Joseph started work on selectable AIP storage location
  • Mark Jordan is working on DIP upload to CONTENTdm

Deployment[edit]

Testing[edit]

Documentation[edit]

chat log[edit]

(10:46:07 AM) epmclellan1: meeting time? I can take notes
(10:46:07 AM) ARTi left the room (quit: Read error: Connection reset by peer).
(10:46:37 AM) peterVG: back
(10:46:43 AM) epmclellan1: we just lost Autsin but we can still start with dev
(10:47:21 AM) courtney: are mockups dev?
(10:47:21 AM) mcantelon: I've started working on grouping jobs, in transfers, by microservice.
(10:47:28 AM) epmclellan1: great!
(10:47:43 AM) courtney: super 
(10:47:43 AM) ARTi [~austin@24.207.112.199] entered the room.
(10:47:51 AM) epmclellan1: hi ARTi
(10:47:55 AM) epmclellan1: we've just started with dev
(10:47:58 AM) courtney: epmclellan1: didn't you start grouping microservices somewhere on the wiki?
(10:48:17 AM) epmclellan1: yes, it's linked from the issue I think...
(10:48:33 AM) ARTi: yes.. dunno if the internet is unstable here, I havnt seen anything since pool table
(10:48:54 AM) epmclellan1: micro-services grouping issue is http://code.google.com/p/archivematica/issues/detail?id=320
(10:49:03 AM) epmclellan1: includes mock-up and list of micro-services
(10:49:29 AM) mcantelon: I *think* in the database there are already grouped, so it's just a matter of exposing that in the interface.
berwin22 berwin221 
(10:49:56 AM) epmclellan1: berwin221 is that correct? ^
(10:49:56 AM) berwin221: yes
(10:50:16 AM) berwin221: it's by no means complete, but a start
(10:50:17 AM) courtney: how much will the changes we're making in transfer backup impact the microservices grouping
(10:50:24 AM) berwin221: and should give mcantelon something to work with
(10:50:38 AM) epmclellan1: courtney: not too much, I think
(10:50:45 AM) mcantelon: Yeah, it shouldn't be too much longer until I have something to show. 
(10:50:47 AM) epmclellan1: we're moving whole micro-services, not just individual tasks
(10:50:52 AM) courtney: ok
(10:51:27 AM) courtney: we should make a firm decision about which microservices are absolutely necessary for transfer backup - i'm writing up requirements today
(10:51:33 AM) epmclellan1: I like courtney's start transfer mockup
(10:51:40 AM) courtney: : )
(10:51:42 AM) epmclellan1: yes, we can talk about that after the meeting
(10:51:44 AM) peterVG: yes, nicely done
(10:52:01 AM) courtney: it's going to change significantly today - and i'm adding several more
(10:52:05 AM) epmclellan1: getting a lot of new ideas about how archivists can handle everything from accession forward
(10:52:22 AM) epmclellan1: no other system will do anything like this
(10:52:32 AM) courtney: eliminating the need for archives to do preliminary backup actions
(10:52:38 AM) courtney: which are currently haphazard at best
(10:52:51 AM) epmclellan1: and allowing them to get a better handle on their backlog
(10:52:58 AM) epmclellan1: in terms of understanding what's in it
(10:53:20 AM) epmclellan1: berwin221 dev news?
(10:53:38 AM) berwin221: dev:Work on sort of structmap:
(10:53:38 AM) berwin221: Did the default sort, and it appears to be by the binary representation of letters, so case then alphabetic
(10:54:09 AM) epmclellan1: how does it handle numbers?
(10:54:16 AM) epmclellan1: image001, image002 etc
(10:54:18 AM) peterVG: berwin221 "so case then alphabetic"?
(10:54:55 AM) peterVG: berwin221 sorry don't fully understand what the implications are
(10:55:20 AM) berwin221: dev:Work on selectable AIP storage location.
(10:55:20 AM) berwin221: Making another selection step, to pick the destination, from a specified list in the database.
(10:55:20 AM) berwin221: The selection is stored in a variable, as a replacement dic, passed down the chain.
(10:56:02 AM) berwin221: numbers get sorted in the alphabetic step
(10:56:22 AM) epmclellan1: ok
(10:56:29 AM) berwin221: http://www.asciitable.com/
(10:57:12 AM) epmclellan1: so for digitization output where eg one file equals a page, user needs to use naming, numberin and capitalization conventions
(10:57:20 AM) epmclellan1: which seems reasonable
(10:57:58 AM) peterVG: berwin221 does that mean 'S' will get positioned before 'r' ?
(10:58:13 AM) berwin221: yes
(10:58:24 AM) epmclellan1: is there a way around that?
(10:58:24 AM) peterVG: that's not desirable though is it?
(10:58:42 AM) ARTi: notes so far http://archivematica.org/wiki/index.php?title=Meeting_20120314
(10:58:57 AM) epmclellan1: thanks for taking notes, ARTi
(10:59:00 AM) ARTi: np
(10:59:52 AM) berwin221: is there a way around that? time and money
(11:00:02 AM) berwin221: I've only started looking at the issue
(11:00:22 AM) peterVG: berwin221 did you talk to Mike about it?
(11:00:37 AM) berwin221: no
(11:00:44 AM) mcantelon: Not sure on the problem surface, but maybe there's a way to hack in natural sorting? http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort
(11:01:06 AM) peterVG: multi-lingual alpha sorting is complex, so good to run by other devs for suggestions
(11:01:29 AM) berwin221: it's not multilingual
(11:01:31 AM) peterVG: I think we also need to be clearer on the requirement then
(11:01:43 AM) berwin221: at that point, we've stripped the unicode
(11:01:43 AM) ARTi: mcantelon: cool
(11:02:01 AM) mcantelon: Multi-lingual sorting seems like it could be complex (presuming different culturing have different ways of sorting)...
(11:02:10 AM) peterVG: so we wouldn't be able to sort any files coming in using Unicode chars?
(11:02:21 AM) peterVG: e.g. anything non-ASCII?
(11:02:26 AM) epmclellan1: We would need to sort by original name
(11:02:30 AM) mjsuhonos: i've used the unicode decimal value as a sort index, but only for the first character or few
(11:02:36 AM) epmclellan1: instead of sanitized name
(11:02:41 AM) epmclellan1: would that be possible?
(11:02:59 AM) mjsuhonos: that will cause sorting to align with the UTF-8 mapping, but don't know if that will be cultural
(11:03:25 AM) peterVG: mjsuhonos: does it also put capitalized letters before lower-case or does the Unicode decimal value respect this order?
(11:03:50 AM) mjsuhonos: it just follows the unicode planes.  IIRC, upper-case characters are all mapped together
(11:03:55 AM) mjsuhonos: aabbccAABBCC
(11:04:13 AM) peterVG: hmm, so not true natural language sorting
(11:04:16 AM) ARTi left the room (quit: Read error: Connection reset by peer).
(11:04:20 AM) peterVG: pipedream?
(11:04:23 AM) mjsuhonos: no, it's glyph sorting.
(11:04:28 AM) mjsuhonos: pipedream for sure.
(11:04:46 AM) mjsuhonos: "natural order sorting" requires normalization and maybe even transliteration.  blag magic at best
(11:04:54 AM) peterVG: okay, let's establish then what is actually possible with existing libraries available to us
(11:05:07 AM) epmclellan1: makes me wonder, if the objects are supposed to form eg the pages of a book, whether the user should have some means of ordering them during ingest
(11:05:09 AM) peterVG: let's continue in seperate thread, post-meeting?
(11:05:12 AM) epmclellan1: ok
(11:05:46 AM) epmclellan1: any more dev?
(11:06:00 AM) peterVG: berwin221 can you please initiate on archivematica@artefactual.com or public list? (and include mjsuhonos)
(11:06:24 AM) berwin221: k
(11:06:26 AM) peterVG: thx
(11:06:39 AM) peterVG: we've lost Austin again?
(11:06:54 AM) epmclellan1: looks like it
(11:06:59 AM) mcantelon: PyICU sounds like it might deal with Unicode sorting... http://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python
(11:07:01 AM) epmclellan1: I can finish minutes
(11:07:01 AM) peterVG: what's the ETA on 12.04 port and multi-processor VM testing
(11:07:22 AM) peterVG: that's most urgent task for him now as per last week's dev mtg
(11:07:24 AM) Sevein: 12.04 end of April
(11:07:38 AM) Sevein: well, the Ubuntu release I meant
(11:07:39 AM) peterVG: Sevein: we've started porting to 12.04beta
(11:07:46 AM) Sevein: yup, I know
(11:07:51 AM) peterVG: just wondering on ETA for completion of our package updates
(11:07:55 AM) ARTi [~austin@24.207.112.199] entered the room.
(11:08:13 AM) ARTi: bleh nets.. did I miss anything to add to notes?
(11:08:15 AM) peterVG: so that we can start multi-processor node testing
(11:08:40 AM) epmclellan1: ARTi: I'll finish notes, I'll have the whole chat log
(11:08:45 AM) peterVG: ARTi: we were talking about multilingual/UTF8 alpha sorting
(11:08:49 AM) ARTi: epmclellan1: cheers
(11:08:59 AM) peterVG: then I had a question about status of work on 12.04 porting
(11:09:27 AM) ARTi: I havnt looked at it from last week, but its mostly done if I recall
(11:09:36 AM) peterVG: as per last week's dev meeting that is your most urgent task now, followed by multiprocesser/node testing once 12.04beta porting is completed
(11:09:55 AM) ARTi: yep, on it
(11:10:15 AM) peterVG: MarkJ is pretty much handling all of the ContentDM task so you're off hook for that
(11:11:04 AM) ARTi: cool
(11:11:34 AM) epmclellan1: re contentDM and ordering etc, I've emailed UBC Library to get more details about requirements
(11:11:47 AM) epmclellan1: their requirements may be fairly simple
(11:13:15 AM) peterVG: okay, but just to reiterate, our alpha sorting requirement should use original (pre-sanitized) filenames, sort on UTF-8 chars, and respect numbers/lower-upper case
(11:13:28 AM) epmclellan1: yes
(11:13:39 AM) epmclellan1: that's the minimum
(11:13:51 AM) peterVG: sounds like mcantelon's link above is good place to start for further investigation into how much of this is possible with existing libraries
(11:14:07 AM) epmclellan1: need to know from UBC if they have logical structure requirements beyond alphanumeric sorting
(11:14:16 AM) epmclellan1: hopefully not
(11:14:26 AM) peterVG: epmclellan1: right, related but two seperate issues
(11:14:45 AM) peterVG: can someone pls update the alpha sorting issue with this updated ^ discussion 
(11:14:45 AM) epmclellan1: well, it will dictate how we structure the structMap in METS
(11:14:52 AM) epmclellan1: I can update the issue
(11:15:03 AM) peterVG: yes, but that is a seperate issue from getting alpha sorting working
(11:15:07 AM) epmclellan1: right
(11:15:37 AM) peterVG: okay, that's time eh?
(11:15:51 AM) epmclellan1: k
(11:19:43 AM) epmclellan1: alpha sorting issue updated: http://code.google.com/p/archivematica/issues/detail?id=937