CVA Digital Archives Team meeting March 24, 2011
Present: Heather, Sue, Courtney, Glenn, Evelyn, Peter, Jessica
Scope of work for 2011
The goal for VANOC records is to be using Archivematica in production by December 2011 (Archivematica 0.8). By that time, the AIP structure will be fully implemented, the DIPs in ICA-AtoM will be linked to the AIPs and metadata updates to the DIPs will be capable of being synced with the metadata in the AIPs. It was noted that it would be good to have a lot of DIPs in the access system by the end of 2012, since the project is getting a lot of press.
For the remainder of 2011, the majority of the work by CVA will be the preparation of SIPs. These SIPs will be ingested into the production version of Archivematica starting in December 2011. Since the production of SIPs will hopefully be aided considerably by the use of Fiwalk and Curator's Workbench, Artefactual will package and install these tools for testing at CVA as soon as possible. Archivists' Toolkit may also be useful. If these tools are not adequate, Artefactual will write "plain vanilla" scripts to assist in the preparation of SIPS.
The SIPs will need to be placed into network storage until they can be ingested into Archivematica. Courtney will physically transfer the SIPs via detachable media to the network.
Use of ICA-AtoM
Courtney has been working on arranging and describing the VANOC records and should be done by the end of next week (it consists of one fonds and approximately 50 series). CVA will install ICA-AtoM version 1.1 and Courtney will enter the descriptions into it. This will be a production version of ICA-AtoM, and CVA's existing archival descriptions will ultimately be migrated to it. This ICA-AtoM will be set up on the City network. CVA staff need to set that in motion - i.e. get IT approval etc. Also, CVA needs to answer the question of where to store DIPs on the City Network. Jessica will be the Artefactual liaison for installation and use of ICA-AtoM.
CVA needs to talk to IT about whether ICA-AtoM has to incorporate the City's website theme, especially since there is a major web redesign underway at the City. Adopting the "look and feel" of the website wouldn't be too hard but if they want integrated search (i.e. archival materials to be retrieved in general City website searches) then that will be more complex. Heather will touch base with someone on the web development project to find out the requirements. Scalability testing at CVA
Austin has to do scalability testing at CVA - it would be good to know before CVA starts forming SIPs what the upward limit is. Then we could have more rational discussion with Gordon (last name?) about how to optimize CVA's hardware for performance. May or June might be the right time to get Gordon involved. Sue will send him an email letting him know about this discussion.
Video preservation work with D. Rice
Sue provided an activity diagram for defining the questions to be addressed by David Rice. Evelyn, Sue, Glenn and Courtney will teleconference with D. on Tuesday March 29 at 9:00 a.m.
Archivematica will likely need to use tools in addition to FITS to characterize video files. FITS output is important for creating the PREMIS files, but additional metadata from something like MediaInfo could be stored in a separate file or be incorporated into PREMIS. Video-specific metadata extraction and normalization instructions would be something that could be included in configuration-specific file(s) included in a SIP - i.e. the files would contain processing instructions that would override defaults.
Re preservation formats, it would be nice to have compressed and uncompressed options for born-digital video files. We also need to determine which tools can be incorporated into Archivematica.
Re format conversions done prior to ingest (i.e. conversion from analogue to digital), Evelyn will ask Joseph what format the conversion logs produced outside of Archivematica should be in.
- Install Fiwalk and Curator's workbench at CVA (Austin)
- Test Fiwalk and Curator's workbench as SIP creation tools (Courtney and Glenn)
- Install ICA-AtoM 1.1 at CVA (Heather to start the process by talking to IT)
- Talk to IT about whether ICA-AtoM installed at CVA will need to conform to City's website design standards (Heather)
- Conduct Archivematica scalability testing at CVA (Austin)
- Email Gordon about current state of discussion re optimizing CVA's hardware for performance (Sue)
- Meet with D. Rice re video preservation (Sue, Courtney, Glenn, Evelyn)
- Ask Joseph about required formats for video reformatting logs ingested into Archivematica (Evelyn)
Action items from meeting of October 5, 2010
- Image the MS Sharepoint server and send it back to IT (Glenn, Courtney, Sue)
- Review transfer/submission tools for possible inclusion in Archivematica (Courtney)
- Add transfer/submission tools to Archivematica as standalone tools for now; include ability to read log file data into the AIP. (Austin,
- Start scalability testing for Archivematica pre-0.7 (Artefactual
- Determine the metadata requirements for an accession record - include the three appraisal processes, the logs being created by digital forensics tools, the Archivists' Toolkit accession record schema, the TAPER project, submission agreement(s). The metadata will be captured as an Archivematica XML schema which will slot into the dmdSec of the AIP METS.xml (Evelyn, Courtney, Glenn, Sue, Peter)
- Review proposed PREMIS metadata elements at http://www.archivematica.org/wiki/index.php?title=Metadata_elements
- Determine structure of METS file for the AIP - to be compatible with the RXP specification (Evelyn, to be reviewed by Glenn, Courtney, Sue,
- Revise the Vancouver Digital Archives UML diagrams, if possible before 0.7 (Glenn, Courtney, Sue, Peter, Evelyn)
- Capture MS Sharepoint top-level structure, possibly as web capture through httrack (it's a web interface to shared network structure) (Courtney, Glenn) 2. Determine out how to export from Sharepoint, especially how to capture the metadata from each site (Courtney, Glenn, Peter, Evelyn)
Action items from meeting of August 4, 2010
Notes and action items from Vancouver Digital Archives meeting August 4, 2010
- Archivematica 0.7 is scheduled for release in November but we can push out changes to CVA earlier as needed.
- We have to finalize the metadata we want as part of our METS.xml file before we ingest SIPs into 0.7.
- May want to include RXP (from TIPR) - the spec is done
- Glenn to contact Joe Tennis about his input
- Other main concerns are scalability and automation, which will be a large part of Archivematica 0.7 development
- Another concern is the addition of more normalization paths
- Evelyn to test ingesting different kinds of SIPs on-site at CVA, starting Monday
- Currently Courtney has ca. 23 TB of digital objects at her desk
- There have been all kinds of copying problems: formatting, permissions,
mounting, they have all kinds of workarounds. The problems are caused by the multiplicity of hardware and operating systems and are not Archivematica-related
- Courtney has documented what works (eg optical procedures & other
- CVA IT is working on imaging the drives instead of copying them
- What are the requirements for a SIP creation tool? CVA is concerned about turning all this material into SIPs. If they tell Artefactual what they want Artefactual can develop a tool or some kind of workflow methodology
- How does the process of arrangement and description fit with the process of creating SIPs? I.e. how much of it is incorporated into the SIP creation process, eg through the addition of DC metadata?
- A lot of appraisal will have to take place prior to ingest; there's a lot of stuff that lacks archival value
- When we're testing SIP ingest we'll test on backup copies from the copies made of the test content drives so that we don't have to go back to the original donor's drives if we have a problem with the SIPs
- How to extract digital objects from Sharepoint?
- Should start with appraisal
- What record types does it have?
- Does any functionality of SP need to be preserved?
- Is SP a record-keeping system? How much of the system needs to be documented?
- Glenn and Courtney will have a look at the system and see how to start
- We should do some SharePoint testing prior to release of 0.7 – i.e. extracting SIPs and processing them in Archivematica
- There is no problem with CVA staff reporting issues via e-mail or otherwise anecdotally and having Artefactual turn them into issues
- Certain processes should be documented on the wikis: physical transfers (CVA wiki), appraisal and selection (CVA wiki), SIP creation requirements (Archivematica)
- Ongoing transfers (Courtney and Glenn)
- Order hardware: (new, longer usb cables, more docks without screws (BlackX, for example, and something else with correct firewire), labelmaker, storage box for screws and small pieces (Sue)
- Set up remote access to CVA test station (Austin)
- Archivematica 0.7 development meeting (Peter, Austin, Joseph, Evelyn)
- Test SIP ingest using various test records (Evelyn with Courtney and Glenn)
- Review/Appraise for Acquisition Sharepoint (Courtney and Glenn)
- Add documentation to wikis (Evelyn)
- Contact Joe Tennis about metadata elements in AIP (Glenn)
Action items from meeting of Jan. 21, 2010
- Update system hardware wiki page
(http://www.artefactual.com/wiki/index.php?title=System_Hardware), especially to reflect different storage media formatting steps for Windows, Linux and Mac (Austin)
- Look into possibility of using MD5sum on removable storage media to facilitate integrity checking of transfers between original system, removable media, processing station, duplicate removable media etc.
- Continue to work with HTTrack to determine best settings for capturing websites (Glenn and Courtney)
- Investigate HTTrack log file to determine whether it can be reduced in size (Glenn and Courtney)
- Add website formats to formats page wiki
- Set up duplication station and run duplication tests (i.e. copying records from one removable hard drive to another removable hard drive) (Sue, Glenn, Courtney)
- Continue researching significant properties of file formats listed in the file format wiki page and documenting/proposing preservation plans for the formats
Action items from meeting of Aug. 13, 2009
- find out about removing older versions of docs and their metadata from TRIM export
- find out about getting records with lower-case file extensions included in TRIM export
- finish TRIM-specific activity diagram
- find out about global conversion of vmbx to msg formats for exported e-mail
- get TRIM metadata dictionary from HP (since the software is DoD-compliant, we should be able to get a TRIM/DoD crosswalk from HP)
- consult with Terra to find out more about how users will export records from TRIM (ie what's the organizing principle)
- find out how restricted information is identified in TRIM, and how we can we avoid getting a SIP with a bunch of personal information in it, or at least know if there is personal information in it
- run test exports from TRIM through Archivematica 0.3.2, to test the system and to come to a better understanding of what should constitute the SIP and AIP (note: Peter & Evelyn should obtain opies of the test exports so they can work with them as well)
- review msg files to see if attachments can be removed after export
- other technical issues (added to http://code.google.com/p/archivematica/issues/list#)
- add TRIM metadata to InterPARES/PREMIS metadata crosswalk
- find out more about how National Archives of Australia manages quarantine process
- update documentation to add notes from this meeting (anyone can update the documentation at any time, though)
- find out what kind of storage the City has - Windows, Linux? That may make a difference to whether viruses could affect stored files
Notes and action items from meeting of Sept. 25. 2009
- investigate TRIMport as export method (Terra, Glenn, Courtney)
- test archivematica 0.3.5 and provide feedback for 0.4 (Glenn, Courtney, Sue)
- write documentation for release 0.3.5 (Glenn, Courtney, Sue)
- find out whether vmbx files can be excluded from TRIM exports (Terra)
- add CoP metadata to metadata crosswalk table by Oct. 21(Sherry, Harrison, Adam)
- add metadata to documentation (Sherry, Harrison, Adam)
- test storage from archivematica to City network (Sue)
- get an infected file to test ClamAV (Sue)
- prepare Archivematica 0.4 (Austin, Evelyn)
- write analysis of whether TRIM will serve as a digital archives system (for final report) (Courtney)
- provide draft final report by Oct. 21 (Peter)
- review draft of final report (Heather, Glenn, Courtney, Sue, Jim Suderman)
- Friday Sept 25: team meeting to demo Archivematica 0.3.5 and review the process to be used for testing (all)
- Monday September 28 - Friday October 9: work with and test Archivematica and report on any desired changes and improvements to process and/or tools (CVA)
- Friday October 16: release Archivematica 0.4 based on feedback from testing (Artefactual)
- Week of October 19-23 run tests again with Archivematica 0.4 and capture user documentation (CVA)
VanDocs Export Technical Requirements for Digital Archives
Excerpt from e-mail from Terra to Ted Williams and Peter Katsaris (IT), Sept. 25:
Below are the technical requirements discussed at today's meeting (ordered by priority). The first two must be verified before the end of October.
1) Network Directory: Export to a city network directory
2) XML Format: The VanDocs object and metadata must be exported together and metadata must be in XML. The metadata elements currently included in the TRIM XML Export format are acceptable.
- we currently have two options for export: 1) TRIM XML Export, which is not meant to be used for bulk exports, and 2) TRIMPort, which has been used already by the City for bulk imports. TRIMPort uses tab delimited format and not XML format. If we need to use TRIMPort for exports, can we will need to convert tab delimited to XML. Also need to ensure that the metadata elements extracted using TRIMPort include at least what is exported using TRIM XML.
3) Encryption: All encryption (if present) must be removed
4) Final Revision: Only the final 'revision' metadata & object should be exported (Terra's note: I believe this is possible with TRIMPort and not TRIM XML)
5) Checksum: Export must include a checksum (MD5) for each file. Checksum metadata should be included in the XML metadata (hash value, algorithm used, time generated)
6) Folder Transfers: Exports should be organized and transferred by TRIM folder (and not as individual documents)
7) VMBX and MSG Formats: VMBX metadata & file must be filtered out on export and only include the MSG. If not filtered out at export, then must be filtered out during ingest
8) Security: The Records and Information Management (RIM) Office needs to start identifying groups of documents that have personal information security requirements.
- the records and retention and disposition schedule (VanRIMS) identifies Personal Information Banks (PIBs). This identifier needs to be imported from the VanRIMS Admin Tool
Action items from meeting of October 19, 2009
- analyse TRIM export metadata and map to CoP and PREMIS
- start with CoV activity diagrams, then map to PREMIS (Glenn, Courtney), CoP (Adam) and TRIM (Adam)
- update workflow diagrams, UML activity diagrams and documentation (Glenn, Courtney, Sue)
- work with IT Security to set up test for saving AIPs to server storage (Sue, Heather)
- test saving AIPs to server storage (Sue, Austin)
- provide draft final report (Peter)
- present final report and demo Archivematica to project manager and VanDocs team in early November (Peter)