Ingest (0.5)

From Archivematica
Revision as of 14:23, 19 November 2009 by Evelyn McLellan (talk | contribs)
Jump to navigation Jump to search

Main Page > Documentation > Release 0.3 Documentation > Ingest (0.3)


AD1 Receive SIP

File:Archivematica AD1 ReceiveSIP v1.pdf


Workflow diagram step Description UML diagram references
Producer places SIP in shared folder on host machine
  • The purpose of shared folders is to allow the Producer to drop SIPs into a folder on their host machine or network and have the SIPs automatically appear in a folder in Archivematica.
    • For instructions on setting up shared folders, please go to Virtual appliance instructions.
    • Neither the folder name nor any of the filenames should include spaces or special characters. Underscores are ok.
    • Make the shared folder in Archivematica /home/demoreceiveSIP/.
  • For testing purposes you can avoid setting up shared folders and simply use the test files found in /home/demo/testFiles/.
SIP appears in shared folder in Archivematica
  • SIP will appear in /home/demo/receiveSIP/.
  • If using files from /home/demo/testFiles/, copy a folder from /testFiles/ into /home/demo/receiveSIP/.
Archivist copies SIP from shared folder to SIP backup folder.
  • Copy the SIP in /home/demo/receiveSIP/ and paste it into /home/demo/receiveSIPbackup/. If anything goes wrong during the ingest process, this backup copy can be retrieved and processed.

AD2 Audit SIP

File:Archivematica AD2 AuditSIP v5.pdf


Workflow diagram step Description UML diagram references
Archivist moves SIP from shared folder into quarantine
  • Drag the SIP from /home/demo/receiveSIP/ to /home/demo/quarantine.
SIP is quarantined for 2 minutes
  • In a production system, SIPs would normally be quarantined for a set period of time (for example, four weeks), to allow anti-virus software to be updated with the latest virus profiles.
  • A lock should appear on the SIP folder in quarantine. The archivist will not be able to read or modify the files during this time.
SIP is scanned for malware
  • At the end of the quarantine period, ClamAV will automatically scan the files for viruses and other malware.
Infected files are sent to possiblevirii folder
  • Infected files will appear in /home/demo/possiblevirii/. If this occurs, do not take any further steps in the ingest process. Inform the Producer that infected files have been found. It is recommended at this point to delete all SIP copies and request that the Producer take steps to review the causes of the problem and eventually resubmit a malware-free SIP.

AD3 Accept SIP for Ingest

File:Archivematica AD3 AcceptSIPforIngest v4.pdf


Workflow diagram step Description UML diagram references
SIP contents are identified and validated using FITS
  • FITS (File Information Tool Set) is automatically launched once the quarantine period has ended and the files have been scanned for viruses.
  • FITS incorporates format identification and validation tools such as DROID. JHOVE and the New Zealand Metadata Extractor, comparing the results of each tool and extracting a set of identification, validation and technical metadata. For more information on the FITS tool, see File Information Tool Set (FITS) [http://code.google.com/p/fits/ File Information Tool Set (FITS)
Identification/validation reports are sent to accessions folder
  • The FITS report will appear in /home/demo/accessionreports/. The report appears as a folder with a 10-digit number; inside the folder is a report for each file in the SIP.
  • Note that each report contains an MD5 checksum for the file.
Virus-checker report is sent to accessions folder
  • A report on ClamAV's virus scan will appear automatically: home/demo/accessionreports/virus.log.
Accession log is sent to accessions folder
  • A report on the accession process will appear automatically: home/demo/accessionreports/accession.log.
  • For each file in the SIP, the accession log will state "Accession of /tmp/accession-[FITS folder number]/[SIP number]/filename] completed successfully."

Go to: 3.5 - Extract metadata

3.5 Extract metadata

Extract preservation metadata from content objects in the SIP

Go to: - 3.6 Audit submission and select for preservation

  • This step seems to have been accomplished in steps 3.4 and 3.6.
  • In archivematica, the metadata extraction activity takes place at the same time as format identification, as the NLNZ Metadata extractor tool also performs format identification.
  • What is the necessary metadata that must be extracted at this point?
3.6 Audit submission and select for preservation (UC-4.6)

Based on the results of steps 3.2, 3.3, and 3.4, apply Archives policies and determine which (if any) content objects in the SIP should not be included in the AIP Document which content objects will not be included and why.

If all SIP content is to be included in the AIP, go to: 3.10 - Accept selected SIP components for ingest


If some SIP content is to be excluded from the AIP, and submission agreement requires notifying Producer of appraisal decision, go to: 3.7 - Notify Producer of appraisal decision

  • Else, go to: 3.9 - Destroy unselected SIP components

Possible reasons for exclusion:

  • technical
    • insufficient preservation metadata
    • unrecognized format
    • unsupported format
    • invalid format
  • appraisal
    • duplicate content
    • does file level selection occur at this point (e.g. methodological sampling of records at the file level for selective retention?) or before the SIPs get to this point?
    • further to the above, this diagram only deals with one sip at a time. How do we manage appraisal decisions that must take into consideration the relationships among many SIPs?
3.7 Notify Producer about appraisal decision(s) Provide the Producer with copies of the appraisal report or other documentation as required by the submisssion agreement identifying SIP components to be destroyed.

If the Producer appeals the appraisal decision, go to: 3.8 - Evaluate appeals

Else, go to: 3.9 - Destroy unselected SIP components

3.8 Evaluate appeals
  • Receive Producer's appeals
  • Evaluate appeals
  • Based on the evaluation, make any necessary amendments to the appraisal decision
  • Notify the Producer of the outcome of the appeal process

Go to: 3.9 - Destroy unselected SIP components

3.9 - Destroy unselected SIP components
  • Destroy all SIP content objects identified for destruction in the appraisal decision (or amended appraisal decision as appropriate).
  • Document destruction
  • Note - it is possible that no components have been identified for destruction. In this case, move on to step 3.10
  • Do we need to destroy accompanying metadata as well?
3.10 Accept selected SIP components for ingest

AD4 Generate AIP

File:Archivematica AD4 GenerateAIP v2.pdf


Step Implementation Notes
4.1 Create AIP containers If 1 SIP = 1 AIP this step is not necessary. It is only necessary if the SIP is being divided into multiple AIPs.
4.2 Add Content Information to AIP See note for step 4.1, above
4.3 Transform Content Information
  • In /demo/ingest/2009_01, create a folder entitled normalized.
  • Using Xena, normalize the contents of each AIP.
    • Set destination folder for the normalization to /home/demo/ingest/2009_01/normalized/.
    • Set the destination folder for the Xena log file to /home/demo/ingest/2009_01/PDI/.
4.4 Add Transformed Content Information to AIP See step 4.3, above.
4.5 Add PDI to AIP Create a plain text report containing provenance and other PDI elements (including arrangement information) and save it to the AIP.
4.6 Generate Descriptive Information (UC-1.4)
  • Obtain available descriptive information from PDI added to AIP, Submission Agreement and/or records schedule, communications with donor, etc.
  • In Qubit, enter descriptive information at aggregate levels of description (fonds, series, file). Then create item-level descriptions for each object to be uploaded.
  • Upload digital objects from /home/demo/ingest/2009_01/content/.
    • In Qubit 1.0.7 batch upload of digital objects is not possible. The objects have to be uploaded 1 at a time.
  • Add additional descriptive information to item-level descriptions.
  • For instructions on using Qubit, go to the on-line user manual.

AD5 Transfer AIP to Archival Storage

File:Archivematica AD5 TransferAIPtoArchivalStorage v2.pdf


Step Implementation Notes
5.1 Request storage of AIP (UC-1.5)
5.2 Transfer AIP to Archival storage (UC-1.5)
  • Right click on folder /home/demo/ingest/2009_01.
  • Click Scripts > BagIt.
  • A bag will be created and stored automatically in /home/demo/mybags/.
  • Copy AIP to /home/ingest/archivalstorage/.
  • In archivematica 0.4 maybe configure BagIt script to drop the bag directly into /home/demo/ingest/archivalstorage/.
  • The BagIt script can also be edited manually if desired: open demo/.gnome2/nautilus-scripts/Bagit and change "mybags" to "archivalstorage".
5.3 Confirm receipt and storage of AIP (UC-1.5)
5.4 Add AIP storage location to descriptive information (UC-1.6) In the physical storage area in Qubit, add the storage location.
5.5 Add Descriptive Information to Data Management (UC-1.6) This was done in AD4, step 4.6. In OAIS, generating descriptive information and adding them to data management are two different steps; however, in Archivematica this is done in one step in Qubit, which is used to upload images to a web interface, generate derivatives for searching and browsing and record descriptive information.
5.6 Confirm update of Data Management
5.7 Destroy SIP and AIP copies Destroy /home/demo/ingest/2009_01 and /home/demo/mybags/2009_01.