Difference between revisions of "Meeting 20110629"

From Archivematica
Jump to navigation Jump to search
Line 11: Line 11:
 
*Austin has installed Heritrix for website archiving, Evelyn will be doing some research/tests for clients
 
*Austin has installed Heritrix for website archiving, Evelyn will be doing some research/tests for clients
 
**For client testing, we'll need all the website archiving tools online, Austin can install it on his public VM until he gets the gibson up and running
 
**For client testing, we'll need all the website archiving tools online, Austin can install it on his public VM until he gets the gibson up and running
**Or we can always setup a separate, temporary DH or iWeb account just for the web archiving testing
+
**Or we can always setup a separate, temporary DH or iWeb account just for the web archiving testing; all we will need is a debian or ubuntu install somewhere with a public IP
  
  
Line 17: Line 17:
  
 
= Documentation =
 
= Documentation =
* Joseph
+
*Evelyn did the user manual and screencast
** http://archivematica.org/wiki/index.php?title=0.7.1_Micro-Services
+
 
** http://archivematica.org/wiki/index.php?title=0.7.1_How-To
 
* Austin
 
** We moved to IRC http://archivematica.org/wiki/index.php?title=Chat_room
 
* Evelyn
 
** Release notes for 0.7.1 are at http://www.archivematica.org/wiki/index.php?title=Archivematica_0.7.1_Release_Notes
 
** also working on the user manual, it should be done by the end of the day today
 
** Will work on transferring it into the wiki this summer.
 
 
= Chat log =
 
= Chat log =
 
<pre>
 
<pre>
 
+
(10:30:50 AM) djjuhasz: Archivematica
 +
(10:30:53 AM) epmclellan: hi!
 +
(10:30:58 AM) berwin22: hi!
 +
(10:31:09 AM) epmclellan: eager Archivematica team
 +
(10:31:19 AM) epmclellan: I can take minutes
 +
(10:31:24 AM) peterVG: hullo...
 +
(10:31:25 AM) berwin22: thanks
 +
(10:32:11 AM) epmclellan: dev news?
 +
(10:32:21 AM) berwin22: I've been working on the transfer area, and some changes to the MCP workflow
 +
(10:33:19 AM) epmclellan: can you give us a few details?
 +
(10:33:37 AM) berwin22: the transfer area should record events that happen to files while they are in that area, moved around, to allow for original location of the file to be stored in metadata
 +
(10:33:56 AM) epmclellan: nice
 +
(10:34:15 AM) epmclellan: how is the information recorded? As METS?
 +
(10:34:30 AM) berwin22: the MCP workflow; I'm trying to move to the database, and break up the long workflow into smaller peices.
 +
(10:34:30 AM) berwin22: Also separating the microservices from the workflow; as a one to one relationship
 +
(10:34:47 AM) berwin22: the information is currently recorded in premis events in the database
 +
(10:35:15 AM) epmclellan: there are new premis events?
 +
(10:35:25 AM) berwin22: there is strong consideration of moving the premis events to the database, so they can be seen in the dashboard
 +
(10:35:39 AM) epmclellan: interesting
 +
(10:35:44 AM) berwin22: then creating the Mets documents based off the information in the database
 +
(10:36:08 AM) berwin22: I'm looking at assigning file uuid as files enter the transfer area
 +
(10:36:42 AM) berwin22: the methodology of checking producer provided checksums is a little weak, and will probably need some tweaking
 +
(10:36:59 AM) epmclellan: ok, thanks, just getting this into the minutes
 +
(10:37:16 AM) peterVG: we also have to consider moving up some other micro-servcies: extraction, name cleanup, virus check?
 +
(10:37:28 AM) peterVG: or not?
 +
(10:37:55 AM) epmclellan: prior to SIP creation? that would make sense, I think
 +
(10:38:12 AM) berwin22: the problem being if the directories are re-arranged, the checksums provided by the producer for the directory may no longer be valid (missing files)
 +
(10:38:13 AM) epmclellan: so the user is still figuring out what's in the transfer, is doing appraisal, physical arrangement etc
 +
(10:38:29 AM) epmclellan: so we check the checksums right away, then assign new ones
 +
(10:38:33 AM) epmclellan: once the SIP is created
 +
(10:38:41 AM) berwin22: the virus check occurs traditionally after quarantine
 +
(10:39:18 AM) berwin22: we could look at moving up extraction, but there is not real gain of moving name sanitization up
 +
(10:39:56 AM) berwin22: on second thought... extraction would cause problems
 +
(10:40:12 AM) epmclellan: do we have a wiki page for capturing these proposed workflow changes? If not, I can create one, or edit existing wiki pages
 +
(10:40:14 AM) berwin22: the files would be added automatically by the pyinotify, and by the extraction script
 +
(10:40:21 AM) berwin22: there would be a conflict of origin
 +
(10:40:54 AM) epmclellan: as always, I'd like to link any workflow decisions back to our requirements
 +
(10:41:08 AM) peterVG: let's discuss further offline today. figure out what is required/possible, then we should get a design page up for transfer accordingly
 +
(10:41:16 AM) epmclellan: ok
 +
(10:41:46 AM) berwin22: time & money = almost anything is possible
 +
(10:42:08 AM) peterVG: haha. right, so let's include limited time and money into that formula
 +
(10:42:49 AM) epmclellan: any other dev news?
 +
(10:43:13 AM) berwin22: not from me
 +
(10:43:28 AM) ARTi: none from me
 +
(10:43:40 AM) epmclellan: well, we know there's some deployment news
 +
(10:43:46 AM) berwin22: oh, I've been trying to make the changes in a sandbox, so I don't wreck the working MCP if people are working with the test version
 +
(10:43:48 AM) epmclellan: i.e. UBC Library
 +
(10:43:59 AM) epmclellan: berwin22: great!
 +
(10:44:00 AM) ARTi: UBC install went well
 +
(10:44:08 AM) ARTi: berwin22: that sounds awesome :]
 +
(10:44:21 AM) epmclellan: actually UBC was an upgrade, right?
 +
(10:45:00 AM) ARTi: well, I did a reinstall on their VM
 +
(10:45:18 AM) ARTi: and then a upgrade on the desktop, there were a few issues that I hit while upgrading which I logged
 +
(10:45:40 AM) peterVG: cool, sounds like we're close to package updates though which is great
 +
(10:46:25 AM) ARTi: yeah
 +
(10:46:28 AM) ARTi: todo : give epmclellan a network diagram for ubc
 +
(10:46:51 AM) epmclellan: yes, need pretty diagram for project report
 +
(10:47:05 AM) epmclellan: and a little description, nothing too fancy
 +
(10:47:14 AM) peterVG: great
 +
(10:47:35 AM) ARTi: will use http://jsplumb.org/jquery/demo.html  :]  for ultimate purtiness
 +
(10:47:43 AM) ARTi: got heritrix up and running, doing a test crawl on archivematica.org
 +
(10:48:10 AM) epmclellan: oh great! I'm looking forward to working with heritrix
 +
(10:48:22 AM) peterVG: sweet
 +
(10:48:29 AM) epmclellan: your diagram can't be too pretty, it'll outshine my diagrams
 +
(10:48:30 AM) ARTi: I will write up some documentation for using the binary,
 +
(10:48:47 AM) ARTi: its pretty straightforward once your in the UI
 +
(10:48:55 AM) peterVG: ARTi: where did you install heretrix?
 +
(10:49:11 AM) ARTi: just running it on my laptop
 +
(10:49:16 AM) peterVG: k
 +
(10:49:58 AM) peterVG: FYI. Not sure I mentioned this yesterday but we'll need all the tools running online when we demo/test with clients
 +
(10:50:21 AM) epmclellan: yes, I think you mentioned that
 +
(10:50:57 AM) epmclellan: on to testing?
 +
(10:50:57 AM) peterVG: actually, we did discuss that sorry
 +
(10:51:04 AM) epmclellan: np
 +
(10:51:29 AM) epmclellan: not much testing from me, now that 0.7.1 is out
 +
(10:51:32 AM) ARTi: yes, on a virtual machine * or something, I can install on my pub VM until we get the gibson up and running
 +
(10:51:41 AM) peterVG: k
 +
(10:52:00 AM) peterVG: wait, the 'gibson'?
 +
(10:52:03 AM) epmclellan: does this go into minutes? not sure where...
 +
(10:52:09 AM) peterVG: deployment
 +
(10:52:13 AM) epmclellan: thx
 +
(10:52:14 AM) ARTi: I dont think I can run it on dreamhost
 +
(10:52:30 AM) peterVG: no I don't think you should
 +
(10:52:55 AM) ARTi: http://www.heymister.net/storage/hackthegibson%20copy.jpg 
 +
(10:52:56 AM) peterVG: is 'gibson' your local setup?
 +
(10:52:59 AM) ARTi: hacking the gibson ^
 +
(10:53:10 AM) peterVG: lol
 +
(10:53:48 AM) epmclellan: heh
 +
(10:53:52 AM) berwin22: any feedback on 0.7.1?
 +
(10:54:04 AM) epmclellan: UBC likes it!
 +
(10:54:23 AM) epmclellan: peterVG: did you get some feedback at CVA?
 +
(10:54:48 AM) peterVG: ARTi: we can always setup a seperate, temporary DH or iWeb account just for the web archiving testing
 +
(10:55:04 AM) peterVG: epmclellan: no
 +
(10:56:57 AM) epmclellan: finished with deployment?
 +
(10:57:10 AM) ARTi: peterVG: yes,  I think all we will need is a debian or ubuntu install somewhere with a pub IP.
 +
(10:57:34 AM) peterVG: cool, let's plan on that.
 +
(10:59:41 AM) epmclellan: Docs: 0.7.1 user manual and screencast are done
 +
(11:00:03 AM) peterVG: great job on those!
 +
(11:00:07 AM) epmclellan: thanks!
 +
(11:00:22 AM) peterVG: in fact, great job on the 0.7.1 release everyone!!
 +
(11:00:27 AM) epmclellan: :)
 +
(11:00:36 AM) ***peterVG high-fiving all over
 +
(11:01:37 AM) epmclellan: any other project news?
 +
(11:02:07 AM) epmclellan: ok, looks like a wrap
 +
(11:02:12 AM) ARTi: w00p!
 
</pre>
 
</pre>

Revision as of 13:04, 29 June 2011

Development

  • Joseph has been working on the transfer area, and some changes to the MCP workflow
    • The transfer area should record events that happen to files while they are in that area, moved around, to allow for original location of the file to be stored in metadata
    • There are new workflow considerations arising from the development of the transfer area; these should be captured in a wiki linking back to our requirements
  • Re the MCP workflow: Joseph is trying to move to the database, and break up the long workflow into smaller pieces; these could be recorded as PREMIS events and displayed in the dashboard

Deployment

  • Austin successfully installed 0.7.1 at UBC Library
    • He did a reinstall on their VM, and then a upgrade on the desktop, there were a few issues that he hit while upgrading which he logged
    • We are close to package updates now
  • Austin has installed Heritrix for website archiving, Evelyn will be doing some research/tests for clients
    • For client testing, we'll need all the website archiving tools online, Austin can install it on his public VM until he gets the gibson up and running
    • Or we can always setup a separate, temporary DH or iWeb account just for the web archiving testing; all we will need is a debian or ubuntu install somewhere with a public IP


Testing

Documentation

  • Evelyn did the user manual and screencast

Chat log

(10:30:50 AM) djjuhasz: Archivematica
(10:30:53 AM) epmclellan: hi!
(10:30:58 AM) berwin22: hi!
(10:31:09 AM) epmclellan: eager Archivematica team
(10:31:19 AM) epmclellan: I can take minutes
(10:31:24 AM) peterVG: hullo...
(10:31:25 AM) berwin22: thanks
(10:32:11 AM) epmclellan: dev news?
(10:32:21 AM) berwin22: I've been working on the transfer area, and some changes to the MCP workflow
(10:33:19 AM) epmclellan: can you give us a few details?
(10:33:37 AM) berwin22: the transfer area should record events that happen to files while they are in that area, moved around, to allow for original location of the file to be stored in metadata
(10:33:56 AM) epmclellan: nice
(10:34:15 AM) epmclellan: how is the information recorded? As METS?
(10:34:30 AM) berwin22: the MCP workflow; I'm trying to move to the database, and break up the long workflow into smaller peices. 
(10:34:30 AM) berwin22: Also separating the microservices from the workflow; as a one to one relationship
(10:34:47 AM) berwin22: the information is currently recorded in premis events in the database
(10:35:15 AM) epmclellan: there are new premis events?
(10:35:25 AM) berwin22: there is strong consideration of moving the premis events to the database, so they can be seen in the dashboard
(10:35:39 AM) epmclellan: interesting
(10:35:44 AM) berwin22: then creating the Mets documents based off the information in the database
(10:36:08 AM) berwin22: I'm looking at assigning file uuid as files enter the transfer area
(10:36:42 AM) berwin22: the methodology of checking producer provided checksums is a little weak, and will probably need some tweaking
(10:36:59 AM) epmclellan: ok, thanks, just getting this into the minutes
(10:37:16 AM) peterVG: we also have to consider moving up some other micro-servcies: extraction, name cleanup, virus check?
(10:37:28 AM) peterVG: or not?
(10:37:55 AM) epmclellan: prior to SIP creation? that would make sense, I think
(10:38:12 AM) berwin22: the problem being if the directories are re-arranged, the checksums provided by the producer for the directory may no longer be valid (missing files)
(10:38:13 AM) epmclellan: so the user is still figuring out what's in the transfer, is doing appraisal, physical arrangement etc
(10:38:29 AM) epmclellan: so we check the checksums right away, then assign new ones
(10:38:33 AM) epmclellan: once the SIP is created
(10:38:41 AM) berwin22: the virus check occurs traditionally after quarantine
(10:39:18 AM) berwin22: we could look at moving up extraction, but there is not real gain of moving name sanitization up
(10:39:56 AM) berwin22: on second thought... extraction would cause problems
(10:40:12 AM) epmclellan: do we have a wiki page for capturing these proposed workflow changes? If not, I can create one, or edit existing wiki pages
(10:40:14 AM) berwin22: the files would be added automatically by the pyinotify, and by the extraction script
(10:40:21 AM) berwin22: there would be a conflict of origin
(10:40:54 AM) epmclellan: as always, I'd like to link any workflow decisions back to our requirements
(10:41:08 AM) peterVG: let's discuss further offline today. figure out what is required/possible, then we should get a design page up for transfer accordingly
(10:41:16 AM) epmclellan: ok
(10:41:46 AM) berwin22: time & money = almost anything is possible
(10:42:08 AM) peterVG: haha. right, so let's include limited time and money into that formula
(10:42:49 AM) epmclellan: any other dev news?
(10:43:13 AM) berwin22: not from me
(10:43:28 AM) ARTi: none from me
(10:43:40 AM) epmclellan: well, we know there's some deployment news
(10:43:46 AM) berwin22: oh, I've been trying to make the changes in a sandbox, so I don't wreck the working MCP if people are working with the test version
(10:43:48 AM) epmclellan: i.e. UBC Library
(10:43:59 AM) epmclellan: berwin22: great!
(10:44:00 AM) ARTi: UBC install went well
(10:44:08 AM) ARTi: berwin22: that sounds awesome :]
(10:44:21 AM) epmclellan: actually UBC was an upgrade, right?
(10:45:00 AM) ARTi: well, I did a reinstall on their VM
(10:45:18 AM) ARTi: and then a upgrade on the desktop, there were a few issues that I hit while upgrading which I logged
(10:45:40 AM) peterVG: cool, sounds like we're close to package updates though which is great
(10:46:25 AM) ARTi: yeah
(10:46:28 AM) ARTi: todo : give epmclellan a network diagram for ubc
(10:46:51 AM) epmclellan: yes, need pretty diagram for project report
(10:47:05 AM) epmclellan: and a little description, nothing too fancy
(10:47:14 AM) peterVG: great
(10:47:35 AM) ARTi: will use http://jsplumb.org/jquery/demo.html  :]  for ultimate purtiness
(10:47:43 AM) ARTi: got heritrix up and running, doing a test crawl on archivematica.org
(10:48:10 AM) epmclellan: oh great! I'm looking forward to working with heritrix
(10:48:22 AM) peterVG: sweet
(10:48:29 AM) epmclellan: your diagram can't be too pretty, it'll outshine my diagrams
(10:48:30 AM) ARTi: I will write up some documentation for using the binary,
(10:48:47 AM) ARTi: its pretty straightforward once your in the UI
(10:48:55 AM) peterVG: ARTi: where did you install heretrix?
(10:49:11 AM) ARTi: just running it on my laptop
(10:49:16 AM) peterVG: k
(10:49:58 AM) peterVG: FYI. Not sure I mentioned this yesterday but we'll need all the tools running online when we demo/test with clients
(10:50:21 AM) epmclellan: yes, I think you mentioned that
(10:50:57 AM) epmclellan: on to testing?
(10:50:57 AM) peterVG: actually, we did discuss that sorry
(10:51:04 AM) epmclellan: np
(10:51:29 AM) epmclellan: not much testing from me, now that 0.7.1 is out
(10:51:32 AM) ARTi: yes, on a virtual machine * or something, I can install on my pub VM until we get the gibson up and running
(10:51:41 AM) peterVG: k
(10:52:00 AM) peterVG: wait, the 'gibson'? 
(10:52:03 AM) epmclellan: does this go into minutes? not sure where...
(10:52:09 AM) peterVG: deployment
(10:52:13 AM) epmclellan: thx
(10:52:14 AM) ARTi: I dont think I can run it on dreamhost
(10:52:30 AM) peterVG: no I don't think you should
(10:52:55 AM) ARTi: http://www.heymister.net/storage/hackthegibson%20copy.jpg  
(10:52:56 AM) peterVG: is 'gibson' your local setup?
(10:52:59 AM) ARTi: hacking the gibson ^
(10:53:10 AM) peterVG: lol
(10:53:48 AM) epmclellan: heh
(10:53:52 AM) berwin22: any feedback on 0.7.1?
(10:54:04 AM) epmclellan: UBC likes it!
(10:54:23 AM) epmclellan: peterVG: did you get some feedback at CVA?
(10:54:48 AM) peterVG: ARTi: we can always setup a seperate, temporary DH or iWeb account just for the web archiving testing
(10:55:04 AM) peterVG: epmclellan: no
(10:56:57 AM) epmclellan: finished with deployment?
(10:57:10 AM) ARTi: peterVG: yes,  I think all we will need is a debian or ubuntu install somewhere with a pub IP. 
(10:57:34 AM) peterVG: cool, let's plan on that. 
(10:59:41 AM) epmclellan: Docs: 0.7.1 user manual and screencast are done
(11:00:03 AM) peterVG: great job on those!
(11:00:07 AM) epmclellan: thanks!
(11:00:22 AM) peterVG: in fact, great job on the 0.7.1 release everyone!!
(11:00:27 AM) epmclellan: :)
(11:00:36 AM) ***peterVG high-fiving all over
(11:01:37 AM) epmclellan: any other project news?
(11:02:07 AM) epmclellan: ok, looks like a wrap
(11:02:12 AM) ARTi: w00p!