Meeting 20120418

From Archivematica
Jump to: navigation, search

Contents

[edit] Development

  • Joseph spent some time looking at unicode errors again: there appear to be some problems with extraction, but I think it might be limited to packages containing unusual character sets.
  • Mark Jordan has been doing a lot of work on CONTENTdm integration

[edit] Deployment

[edit] Documentation

[edit] Testing

[edit] Chat log

(10:31:03 AM) berwin22: I can take notes
(10:31:14 AM) epmclellan: thanks berwin22
(10:31:48 AM) berwin22: Dev:
(10:31:48 AM) berwin22: I took a couple of days off to deal with the business of moving.
(10:31:48 AM) berwin22: I spent some time looking at unicode errors again: there appear to be some problems with extraction, but I think it might be limited to packages containing unusual character sets.
(10:32:53 AM) berwin22: any other dev?
(10:32:59 AM) epmclellan: Mark Jordan has been doing a lot of work on CONTENTdm integration
(10:33:12 AM) epmclellan: he has test files from UBC now and testing
(10:33:26 AM) epmclellan: first he's testing sending DIP to CDM client
(10:33:31 AM) courtney: any new issues come up since our meeting with him?
(10:33:44 AM) epmclellan: I don't think so
(10:35:03 AM) epmclellan: I've started documenting requirements at http://www.archivematica.org/wiki/index.php?title=CONTENTdm_integration
(10:35:28 AM) berwin22: do we know what format we are standardizing on for thumbnails?
(10:35:56 AM) epmclellan: we haven't really discussed that
(10:36:23 AM) epmclellan: jpegs for images
(10:36:23 AM) peterVG: you know, thumbnail format
(10:36:29 AM) berwin22: .ico
(10:36:30 AM) epmclellan: yeah, thmb :)
(10:36:36 AM) epmclellan: oh
(10:36:46 AM) peterVG: no i don't think we want to go .ico
(10:36:55 AM) epmclellan: wouldn't they be little jpegs?
(10:37:02 AM) epmclellan: I don't know much about thumbnails, actually
(10:37:09 AM) peterVG: not as versatile as just standardizing on certain pixel x pixel jpg or png
(10:37:15 AM) ARTi: xpm?
(10:37:43 AM) ARTi: :P
(10:37:47 AM) ARTi: png is nice
(10:37:47 AM) peterVG: okay, needs more investigation i guess
(10:37:56 AM) epmclellan: I'll ask Mark what he thinks
(10:38:02 AM) epmclellan: he's awfully clever
(10:38:02 AM) berwin22: monochrome bitmaps ><
(10:38:09 AM) peterVG: :-)
(10:38:16 AM) ARTi: https://duckduckgo.com/c/Graphics_file_formats
(10:38:22 AM) ARTi: awesome
(10:38:26 AM) berwin22: alright... </discussion on thumbnail format>
(10:39:07 AM) epmclellan: any more dev news?
(10:39:25 AM) courtney: meeting next week about issues on wed
(10:39:31 AM) berwin22: I'm seeing a commit from mcantelon: added file explorer so far r2416
(10:39:34 AM) courtney: austin and peter optional
(10:39:36 AM) ARTi: I just posted a bug to fits regarding some of our testing..
(10:39:56 AM) ARTi: still looking around for a good place to post the pyuno, as the dev hasnt been around for close to a year
(10:40:07 AM) berwin22: ARTi: that was the one that spewed 1.1 MB of output? - we thought was frozen
(10:40:08 AM) ARTi: Ill probably post it w/ him/libre/openoffice
(10:40:22 AM) ARTi: yeah
(10:40:34 AM) ARTi: seems to of happened again with a different file
(10:40:38 AM) berwin22: link?
(10:40:46 AM) ARTi: same server
(10:41:11 AM) ARTi: or http://code.google.com/p/fits/issues/detail?id=26
(10:41:29 AM) ARTi: I said with out any..
(10:41:40 AM) ARTi: but actually its 1.1mb.. maybe you could attach that to the bug?
(10:42:33 AM) berwin22: k
(10:43:21 AM) berwin22: </dev> ?
(10:43:29 AM) berwin22: deployment
(10:43:58 AM) ARTi: http://archivematica.org/wiki/index.php?title=Scalability_testing#Multi-processor_testing
(10:44:22 AM) epmclellan: thanks for updating that, ARTi
(10:44:32 AM) epmclellan: I think it would be good to have some kind of overall problem statement
(10:44:44 AM) epmclellan: i.e. the purpose of the testing, intended outcomes etc
(10:44:53 AM) ARTi: ran some initial tests.. got some data.. maybe we should use files that we know archivematica plays nicely with too
(10:45:07 AM) berwin22: Adding up to 6 processors decreases processing time with each additional processing station.
(10:45:22 AM) epmclellan: do we know by how much?
(10:45:42 AM) ARTi: can you guys add this to the wiki as well?
(10:46:09 AM) epmclellan: sorry, add what?
(10:47:20 AM) epmclellan: ARTi: you mean what berwin22 said?
(10:47:45 AM) ARTi: what berwin said I suppose, if its right
(10:47:59 AM) ARTi: I just want to make sure when I do testing.. Im doing what Im suppose to :]
(10:48:12 AM) ARTi: our goal is to decrease processing, by adding new nodes
(10:48:14 AM) ARTi: right?
(10:48:22 AM) ARTi: processing time*
(10:48:31 AM) epmclellan: berwin22: can we measure how much the processing time decreases with each additional processing station?
(10:48:48 AM) berwin22: peterVG: confirm: Adding up to 6 processors decreases processing time with each additional processing station.
(10:48:48 AM) berwin22: if we prove that, then we've proven what we want to for this round of testing.
(10:49:42 AM) ARTi: so.. I should start doing different files that we know plays nice with everything
(10:49:47 AM) ARTi: maybe a image conversion?
(10:49:52 AM) epmclellan: ARTi: agreed
(10:49:56 AM) berwin22: epmclellan: I believe we'd need that data, to prove that statement
(10:50:23 AM) epmclellan: yes, that data would need to be included in order for users to make informed decisions about adding processors
(10:50:32 AM) courtney: wouldn't we also need to know the size of the transfer and the number of files that the statement is true for?
(10:50:49 AM) ARTi: brb
(10:50:51 AM) courtney: or a range if we test a lot
(10:51:00 AM) epmclellan: yes, those would be part of the testing metrics
(10:51:23 AM) courtney: is that in the link that ARTi just sent?
(10:51:23 AM) berwin22: courtney: if we are using the same data set for each test, those items are controlled. The variable is the "processing power" applied.
(10:51:37 AM) peterVG: agreed
(10:51:51 AM) epmclellan: but maybe we should try at least a couple of different sizes of transfers
(10:51:52 AM) courtney: so what is the control then?
(10:51:59 AM) epmclellan: number of processors
(10:52:09 AM) peterVG: i think we will want to test multiple problem statements
(10:52:18 AM) berwin22: We'll need to know things like the processing power of the machines we are using to test: cpu frequency, ram, disk
(10:52:18 AM) berwin22: and our future tests will ideally use the same.
(10:52:19 AM) courtney: i mean, what is the controlled number of files and transfer size?
(10:52:31 AM) peterVG: courtney: i think that could also be a variable
(10:52:36 AM) berwin22: ram amount, ram speed
(10:52:55 AM) epmclellan: yes, we really need these metrics added to the wiki
(10:53:00 AM) peterVG: the primary problem statement is something like: adding processors decreases processing time
(10:53:08 AM) epmclellan: right
(10:53:14 AM) peterVG: then we need to run seperate tests where we change on variable at a time
(10:53:18 AM) courtney: yes
(10:53:26 AM) peterVG: one variable at a time
(10:53:33 AM) peterVG: this includes, number of files in sip
(10:53:35 AM) peterVG: number of sips
(10:53:38 AM) peterVG: number of processors
(10:53:51 AM) berwin22: certain open office conversions/normalizations are broken at the moment: that should be noted.
(10:53:52 AM) ARTi: btw I have clones made so it takes minutes to boot up new clients to add to the resources.. they are uniform in processing power and mem
(10:53:52 AM) peterVG: entire pipeline versus individual micro-services
(10:54:26 AM) peterVG: amount of ram per processor
(10:54:36 AM) peterVG: etc
(10:54:58 AM) epmclellan: I'm a little worried about getting this granular
(10:55:02 AM) peterVG: as you move along, you may find that its appropriate to add other problem statements
(10:55:03 AM) epmclellan: in terms of how much time we have for testing
(10:55:19 AM) peterVG: i think our testing is meaningless unless we are able to get this granular
(10:55:20 AM) epmclellan: can we still meet our dev deadlines?
(10:55:35 AM) berwin22: If we're truly being scientific, we should acknowledge the risk that all the vm's Austin is running are probably running of one san, and like my testing enviroment, he'll have some cap on disk speeds, but much much higher.
(10:55:51 AM) epmclellan: peterVG: how much time do you anticipate will be spent on testing?
(10:55:55 AM) courtney: i think it means very little to users without info about the size of transfer/# of files
(10:55:56 AM) peterVG: so we need to add that ^ as part of our environment description
(10:56:02 AM) peterVG: courtney: agreed
(10:56:06 AM) epmclellan: courtney: those will be included
(10:56:45 AM) peterVG: epmclellan: I am okay if the majority of ARTi's time is spent on testing for the next 2 weeks
(10:56:48 AM) courtney: just qualifying that those should be there "at the very least" even if we drop some of the granularity for deadlines
(10:56:53 AM) berwin22: I'll need to do a new one for individual microservices.
(10:57:00 AM) epmclellan: ok, so mainly ARTi, that way berwin22 can focus on dev
(10:57:05 AM) peterVG: with support from epmclellan courtney and berwin22 to help formulate tests 
(10:57:06 AM) berwin22: these are the job level, not the microservice level, correct?
(10:57:13 AM) berwin22: new query*
(10:57:25 AM) ARTi: yeah, I can rock out on lots.. just need direction ^^
(10:57:48 AM) epmclellan: ok, I will update the wiki with this discussion
(10:57:51 AM) epmclellan: as best I can
(10:58:26 AM) courtney: want to have a quick chat offline to solidify testing parameters?
(10:58:31 AM) courtney: after meeting?
(10:58:36 AM) epmclellan: sure
(10:59:47 AM) epmclellan: documentation?
(11:00:07 AM) epmclellan: I've added http://www.archivematica.org/wiki/index.php?title=CONTENTdm_integration
(11:00:08 AM) courtney: nada
(11:00:16 AM) epmclellan: still working on it
(11:00:36 AM) courtney: i'll have a lot up this weekend pre-mtg next Wed.
(11:00:44 AM) courtney: will send out email
(11:01:54 AM) JessicaB: Sevein: ping
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox