Difference between revisions of "Ingest (0.5)"

From Archivematica
Jump to navigation Jump to search
 
(56 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Main Page]] > [[Documentation]] > [[Release 0.3 Documentation]] > Ingest (0.3)
+
[[Main Page]] > [[Documentation]] > [[Release 0.5 Documentation]] > Ingest (0.5)
  
  
  
===AD1 Receive SIP===  
+
===Setting up shared folders===
[[File:Archivematica_AD1_ReceiveSIP_v1.pdf|Archivematica UML Activity diagram AD1 Receive SIP]]
+
In order to work through all of the steps in the tables below, you will need to set up two shared folders in Archivematica.  
 
+
*The purpose of shared folders is to allow you to place digital objects into a folder on your host machine and have the objects automatically appear in a folder in Archivematica, and vice versa.
'''Expected Procedures'''
+
*The two folders in Archivematica which need to be set up as shared folders are /home/demo/ingestSIP and /home/demo/storeAIP.
 
+
**/home/demo/ingestSIP is used to ingest SIPs from the host machine into Archivematica.
'''Case 1:Internal Producer'''
+
**/home/demo/storeAIP is used to drop AIPs into a folder in Archivematica and have them appear back in the host machine.
*[1.1] --> [1.3] --> [1.4] --> [1.6] --> [1.8] --> [end]
+
*Recommended names for the folders on the home machine are sendSIP and archivalstorage.
**Assumption - Submission agreements with internal producers will mandate that some type of checksum for validating the integrity of the SIP must be included in the SIP; therefore, this case should never have to invoke step 1.5 - Generate Checksum
+
*For instructions on setting up shared folders, please go to [[Virtual appliance instructions#Import_files_into_virtual_appliance_.28optional.29|Virtual appliance instructions]].
 
+
*For testing purposes you can avoid setting up shared folders and simply use the test files found in /home/demo/testFiles/. However, you will not be moving SIPs into Archivematica or moving stored AIPs out of it.
 
+
<br />
'''Case 2: External Producer''' (may or may not include a checksum as part of the SIP)
 
*[1.1] --> [1.2] --> [1.4] --> [1.6] --> [1.8] --> [end]  (SIP includes checksum); or,
 
*[1.1] --> [1.2] --> [1.5] --> [1.6] --> [1.8] --> [end]  (SIP does not include checksum)
 
 
 
'''Exceptions'''
 
*(for both cases) Submitted SIP includes integrity checksums and fails integrity check at [1.6]
 
  
 +
===Activity diagram 1 Receive SIP===
 +
[[Media:Archivematica_AD1_ReceiveSIP_v1.pdf|Archivematica UML Activity diagram AD1 Receive SIP]]
  
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
Line 25: Line 21:
 
|- style="background-color:#cccccc;"
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Workflow diagram step'''
 
!style="width:20%"|'''Workflow diagram step'''
!style="width:40%"|'''Description'''
+
!style="width:55%"|'''Description'''
!style="width:40%"|'''UML diagram references'''
+
!style="width:25%"|'''Activity diagram references'''
|-
 
|-
 
|'''Producer places SIP in shared folder on host machine'''
 
|
 
*The purpose of shared folders is to allow the Producer to drop SIPs into a folder on their host machine or network and have the SIPs automatically appear in a folder in Archivematica.
 
**For instructions on setting up shared folders, please go to [[Virtual appliance instructions#Import_files_into_virtual_appliance_.28optional.29|Virtual appliance instructions]].
 
**Neither the folder name nor any of the filenames should include spaces or special characters. Underscores are ok.
 
**Make the shared folder in Archivematica /home/demoreceiveSIP/.
 
*For testing purposes you can avoid setting up shared folders and simply use the test files found in /home/demo/testFiles/.
 
|
 
|-
 
|'''SIP appears in shared folder in Archivematica'''
 
|
 
*SIP will appear in /home/demo/receiveSIP/.
 
*If using files from /home/demo/testFiles/, copy a folder from /testFiles/ into /home/demo/receiveSIP/.
 
|
 
|-
 
| '''Archivist copies SIP from shared folder to SIP backup folder.'''
 
*Copy the SIP in /home/demo/receiveSIP/ and paste it into /home/demo/receiveSIPbackup/. If anything goes wrong during the ingest process, this backup copy can be retrieved and processed.
 
|
 
|-
 
|'''Archivist moves SIP from shared folder into quarantine'''
 
|
 
*Drag the SIP from /home/demo/receiveSIP/ to /home/demo/quarantine.
 
|
 
 
|-
 
|-
|'''SIP is quarantined for 2 minutes'''
+
|Producer places SIP in shared folder on host machine
 
|
 
|
*In a production system, SIPs would normally be quarantined for a set period of time (for example, four weeks), to allow anti-virus software to be updated with the latest virus profiles.
+
*Place a folder of digital files into the shared ingest folder on the host machine.
*A lock should appear on the SIP folder in quarantine. The archivist will not be able to read or modify the files during this time.
+
*Note that the SIP does not need to be prepared in any way prior to ingest - i.e. you do not need to prepare it as a METS file or otherwise process the SIP. A simple folder with one or more files in it is fine.
 
|
 
|
 
|-
 
|-
|'''SIP is scanned for malware'''
+
|SIP appears in shared folder in Archivematica
|
 
*At the end of the quarantine period, ClamAV will automatically scan the files for viruses and other malware.
 
 
|
 
|
 +
*SIP will appear in /home/demo/ingestSIP/.
 +
*To navigate to this folder, click Places > Home folder.
 +
|1.4 Receive SIP from Producer (UC-1.1)
 
|-
 
|-
|'''Infected files are sent to possiblevirii folder'''
+
|Archivist copies SIP from shared folder to SIP receipt folder
 
|
 
|
*Infected files will appear in /home/demo/possiblevirii/. If this occurs, do not take any further steps in the ingest process. Inform the Producer that infected files have been found. It is recommended at this point to delete all SIP copies and request that the Producer take steps to review the causes of the problem and eventually resubmit a malware-free SIP.
+
*Copy SIP from /home/demo/ingestSIP/ to /home/demo/receiveSIP/.
 +
*/home/demo/ingestSIP acts as a backup SIP copy. If anything goes wrong during the ingest process, this backup copy can be retrieved and processed.
 
|
 
|
 
|-
 
|-
 +
|}<br />
  
|}
+
===Activity diagram 2 Audit SIP===  
 
+
[[Media:Archivematica_AD2_AuditSIP_v5.pdf|Archivematica UML Activity diagram AD2 Audit SIP]]
===AD2 Audit SIP===  
 
[[File:Archivematica_AD2_AuditSIP_v5.pdf|Archivematica UML Activity diagram AD2 Audit SIP]]
 
 
 
'''Expected Procedure'''
 
*[2.1] --> [2.2] --> [2.4] --> [2.5] --> [2.8] --> [end]
 
 
 
 
 
'''Exceptions'''
 
*malware detected at step 2.3
 
*SIP not compliant at step 2.5
 
 
 
  
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|-
 
|- style="background-color:#cccccc;"
 
|- style="background-color:#cccccc;"
!style="width:20%"|'''Step'''
+
!style="width:20%"|'''Workflow diagram step'''
!style="width:40%"|'''Implementation'''
+
!style="width:55%"|'''Description'''
!style="width:40%"|'''Notes'''
+
!style="width:25%"|'''Activity diagram references'''
 
|-
 
|-
|'''General procedures'''
+
|Archivist moves SIP from SIP receipt folder into quarantine
 
|
 
|
 +
*Drag the SIP from /home/demo/receiveSIP/ and drop it into /home/demo/quarantine.
 +
*Note that you must drag and drop, not copy and paste, in order to trigger the quarantine process.
 +
|2.1 Quarantine SIP
 +
|-
 +
|SIP is quarantined for 2 minutes
 
|
 
|
 +
*In a production system, SIPs would normally be quarantined for a set period of time (for example, four weeks), to allow anti-virus software to be updated with the latest virus profiles.
 +
*A lock should appear on the SIP folder in quarantine. The archivist will not be able to read or modify the files during this time.
 +
|2.1 Quarantine SIP
 
|-
 
|-
|'''2.1''' Quarantine SIP
+
|SIP is scanned for malware
 
|
 
|
*Leave SIP in current location (/home/demo/ingest/2009_0001). Wait for quarantine period to expire.
+
*At the end of the quarantine period, ClamAV will automatically scan the files for viruses and other malware.
*'''Go to: 2.2 - Check SIP for malware'''
+
|2.2 Check SIP for malware
|
 
*In Archivematica 4:
 
**Force 0000 permissions on folder
 
**Cron to crawl(w/ clam) quarantine folder for malware
 
*A background CRC may be acceptable in lieu of an MD5 or other cryptographic hash to verify post-transfer integrity
 
*For information on how National Archives of Australia manages the quarantine process, see [http://www.naa.gov.au/records-management/secure-and-store/e-preservation/at-NAA/process.aspx http://www.naa.gov.au/records-management/secure-and-store/e-preservation/at-NAA/process.aspx]
 
 
|-
 
|-
|'''2.2''' Check SIP for malware
+
|Infected files are sent to possiblevirii folder
 
|
 
|
NOTE: There is no malware checking software included in the Archivimatica 0.3 release, so steps 2.2 and 2.3 are hypothetical only.
+
*Infected files will appear in /home/demo/possiblevirii/. If this occurs, do not take any further steps in the ingest process. Inform the Producer that infected files have been found. It is recommended at this point to delete all SIP copies and request that the Producer take steps to review the causes of the problem and eventually resubmit a malware-free SIP.
*Check SIP for the presence of malware
 
*Create malware check report, copy report to the Accession Documentation folder
 
*'''''If malware is detected,'' go to: 2.3 - Remove malware'''
 
*'''''If malware is not detected,'' go to: 2.4 - Audit SIP for compliance'''
 
 
|
 
|
*Documentation should include:
+
2.4 Audit SIP for compliance<br />
**list of software used to detect malware
+
2.5 Assess SIP defiencies<br />
***related virus/malware definition used to perform the check
+
2.6 Notify Producer of SIP rejection<br />
**date and time of check
+
2.8 Destroy SIP copies
**reports generated by the software identifying infected files, nature of the infection
 
*"Because of the age of the digital components contained in the SIP, the Archive must ensure that malicious code check can recognize very old malicious code." Tufts/Yale Ingest Guide For University Electronic Records, B2.3 ([http://dca.lib.tufts.edu/features/nhprc/reports/ingest/part_B-02-03.html http://dca.lib.tufts.edu/features/nhprc/reports/ingest/part_B-02-03.html])
 
 
|-
 
|-
|'''2.3''' Remove malware
+
|Virus-checker report is sent to accessions folder
 
|
 
|
Attempt to remove malware from the SIP
+
*A report on ClamAV's virus scan will appear automatically: home/demo/accessionreports/virus.log.
 +
|2.4 Audit SIP for compliance
 +
|-
 +
|}<br />
  
*Document the tools and procedures used to remove the malware
+
===Activity diagram 3 Accept SIP for Ingest===
*Document success or failure of the malware removal for each infected file
+
[[Media:Archivematica_AD3_AcceptSIPforIngest_v4.pdf|Archivematica UML Activity diagram AD3 Accept SIP for Ingest]]
*Copy all malware removal report to the Accession Documentation folder
 
  
'''Go to: 2.4 - Audit SIP for compliance'''
+
{| border="1" cellpadding="10" cellspacing="0" width=90%
|
 
Case: malware not removed
 
*Create a plain text report (click on ''applications > accessories > text editor'') describing the type(s) of malware, the efforts made to remove the malware and the reasons for failing to remove it. Save the report to /home/demo/accessions/2009_01.
 
*Presumably any malware removal tools generate reports on success/failure. Depending on the software used, the detection occurring in step 2.2 and the removal occurring in this step may be documented in the same report
 
 
|-
 
|-
 
+
|- style="background-color:#cccccc;"
 
+
!style="width:20%"|'''Workflow diagram step'''
|'''2.4''' Audit SIP for compliance ([[UC-1.3]]; [[UC-4.6]])
+
!style="width:55%"|'''Description'''
|Manually verify that the SIP conforms to the archives' data formatting and documentation standards and meets the specifications of the Submission Agreement. Do this by skimming the filenames and extensions to make sure that what was supposed to be in the SIP according to the Submission Agreement is actually there.
+
!style="width:25%"|'''Activity diagram references'''
 
 
Create audit documentation
 
*Create an audit documentation report as plain text file (click on ''applications > accessories > text editor'')
 
**If the SIP is wholly compliant, note this in the audit report; else
 
**Document deficiencies in the SIP, identifying the nature of the deficiency (missing file extensions, unacceptable formats, unacceptable packaging, presence of unremoved malware, etc.)  and the object(s) the deficiency pertains to (file or group of files, SIP packaging, etc.)
 
*Save the SIP audit documentation to the Accession documentation folder (e.g., /home/demo/accessions/2009_0001/)
 
 
 
'''Go to: 2.5 - Assess SIP deficiencies'''
 
 
 
|
 
*What else should we check for?
 
*Maybe develop a checklist approach to reporting on deficiencies - for example: "contains unacceptable formats YES; records inadequately identified YES..."
 
 
|-
 
|-
|'''2.5''' - Assess SIP deficiencies
 
|
 
Based on the results of the audit performed in step 2.4 - Audit SIP for compliance, determine if the deficiencies identified, if any, warrant rejection of the SIP, or if the SIP can be accepted despite identified deficiencies.
 
If the majority of objects conform to standards and Submission Agreement, the SIP may be considered acceptable. Note that the non-conforming objects can be deleted after appraisal (see AD3 step 3.9).
 
Document the decision to accept or reject the SIP
 
 
'''''If SIP can be accepted for ingest,'''''
 
* Document that SIP has been acepted for ingest
 
* '''go to: 2.8 - Notify producer of SIP acceptance'''
 
  
'''''If SIP can not be accepted for ingest, ''go to: 2.7 - Notify Producer of SIP rejection'''
+
|SIP contents are identified and validated using FITS
|
 
|-
 
|'''2.6''' Notify Producer of SIP rejection
 
 
|
 
|
*Document that the SIP has been rejected, copy the report to the Accession Documentation folder
+
*FITS (File Information Tool Set) is automatically launched once the quarantine period has ended and the files have been scanned for viruses.
*Send an e-mail notification, attaching a copy of the reports created in steps 2.4 or 2.5.
+
*FITS incorporates format identification and validation tools such as DROID. JHOVE and the New Zealand Metadata Extractor, comparing the results of each tool and extracting a set of identification, validation and technical metadata. For more information on the FITS tool, see [http://code.google.com/p/fits/ http://code.google.com/p/fits/]
*'''''If the Producer appeals the SIP rejection: ''
 
**Document receipt of the appeal
 
**''' go to: 2.7 - Evaluate appeals'''
 
 
 
*'''''If the Producer does not appeal the SIP rejection: '' go to: 2.9 - Destroy SIP copies'''
 
 
|
 
|
 +
3.3 Identify formats (UC-1.2, step 3)<br />
 +
3.4 Validate formats (UC-1.2, step 3)<br />
 +
3.5 Extract metadata (UC-1.2, step 3)
 
|-
 
|-
|'''2.7''' Evaluate appeals
+
|Identification/validation reports are sent to accessions folder
 
|
 
|
Evaluate the Producer's appeal of the SIP rejection
+
*The FITS report will appear in /home/demo/accessionreports/. The report appears as a folder with a 10-digit number; inside the folder is a report for each file in the SIP.
*'''''If the appeal is accepted:'''''
+
*Note that each report contains an MD5 checksum for the file.
**Document acceptance of the appeal
 
**'''Go to: 2.8 - Notify producer of SIP acceptance'''
 
*'''''If the appeal is rejected:'''''
 
**Document the rejection of the appeal
 
**Notify the Producer of the appeal rejection
 
**'''Go to 2.9 - Destroy SIP copies'''
 
 
 
 
 
 
|
 
|
 +
3.3 Identify formats (UC-1.2, step 3)<br />
 +
3.4 Validate formats (UC-1.2, step 3)<br />
 +
3.5 Extract metadata (UC-1.2, step 3)
 
|-
 
|-
|'''2.8''' Notify producer of SIP acceptance
+
|Accession log is sent to accessions folder
 
|
 
|
Send Producer notification that the SIP has been accepted for ingest
+
*A report on the accession process will appear automatically: home/demo/accessionreports/accession.log.
Document
+
*For each file in the SIP, the accession log will state "Accession of /tmp/accession-[FITS folder number]/[SIP number]/filename] completed successfully."
 
 
'''Go to: 3.1 - Extract content information from SIP'''
 
 
|
 
|
 
|-
 
|-
|'''2.9''' Destroy SIP copies
 
|Delete SIP from /home/demo/dropoff and /home/demo/ingest.
 
'''End ingest'''
 
|
 
|}
 
 
===AD3 Accept SIP for Ingest===
 
[[File:Archivematica_AD3_AcceptSIPforIngest_v4.pdf|Archivematica UML Activity diagram AD3 Accept SIP for Ingest]]
 
 
'''Expected Procedure'''
 
  
[3.1] --> [3.2] --> [3.3] --> [3.4] --> [3.5] --> [3.6] --> [3.9] --> [3.10]
+
|}<br />
 
 
'''Exceptions'''
 
 
 
*Producer notification required following [3.6]
 
*Appraisal decision appealed following [3.7]
 
  
 +
===Activity diagram 4 Generate AIP===
 +
[[Media:Archivematica_AD4_GenerateAIP_v2.pdf|Archivematica UML Activity diagram AD4 Generate AIP]]
  
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|-
 
|- style="background-color:#cccccc;"
 
|- style="background-color:#cccccc;"
!style="width:20%"|'''Step'''
+
!style="width:20%"|'''Workflow diagram step'''
!style="width:40%"|'''Implementation'''
+
!style="width:55%"|'''Description'''
!style="width:40%"|'''Notes'''
+
!style="width:25%"|'''Activity diagram references'''
 
|-
 
|-
|'''General procedures'''
+
|SIP is moved to AIP preparation folder
|
 
 
|
 
|
 +
*At the end of the quarantine process, Archivematica automatically drops the SIP into /home/demo/prepareAIP
 +
|4.2 Add content information to AIP
 
|-
 
|-
|'''3.1''' Unpack SIP ([[UC-1.3]])
+
|Archivist normalizes files
|
 
Create folders for the SIP contents
 
*Create the following 2 directories within the current folder (e.g., /home/demo/ingest/2009_01):
 
**/content
 
**/PDI
 
*Unpack the SIP
 
**If the SIP uses an archive file format such as .zip, .tar, etc.., extract the contents using the appropriate unpacking software.
 
**Identify the content objects in the SIP and sort them into the '''/content''' directory
 
**Identify the PDI and sort them into the '''/PDI''' directory
 
 
 
'''''Go to:'' 3.2 - Modify/Provide additional PDI'''
 
 
|
 
|
 +
*From Archivematica's Linux desktop, open Xena
 +
*Click Add Directory
 +
*Select /home/demo/prepareAIP/[SIP]/
 +
*In Tools > Xena 4.2.1 Preferences > Xena destination directory enter ''/home/demo/prepareAIP/[SIP]''.
 +
*In Tools > Xena 4.2.1 Preferences > Xena log file enter ''/home/demo/accessionreports/xena_log''.
 +
*Click OK to close Xena 4.2.1 Preferences
 +
*Click Normalise
 +
*Wait for normalization process to be completed (a pop-up dialogue box will open indicating that the process has been completed).
 +
*Click OK to close pop-up window
 +
*Close Xena
 +
|4.3 Transform content information (UC-1.3, step 9)
 
|-
 
|-
|'''3.2''' Modify / provide additional PDI ([[UC-1.3]])
+
|Normalized files are saved to AIP preparation folder
|
 
'''''Go to: '' 3.3 - Identify Formats'''
 
 
|
 
|
 +
*In the SIP, look for files with the extension .xena. These are normalized versions of the original files.
 +
*To view representations of normalized files, open the Xena Viewer from Archivematica's Linux desktop.
 +
|4.3 Transform content information (UC-1.3, step 9)
 
|-
 
|-
|'''3.3''' Identify format
+
|Normalization log is saved to accessions folder
|Use DROID and NLNZ Metadata extractor to identify objects in the SIP. Save the reports to /home/demo/ingest/2009_01/PDI/.
 
 
 
'''''Go to:'' 3.4 - Validate formats
 
 
|
 
|
|-
+
*A log file showing all the actions taken by Xena will appear: /home/demo/accessionreports/xena_log.0.
|'''3.4''' Validate format
 
|Use JHOVE to validate the objects in the SIP. Save the report to /home/demo/ingest/2009_01/PDI/.
 
 
 
'''''Go to:'' 3.5 - Extract metadata'''
 
 
|
 
|
 
|-
 
|-
|'''3.5''' Extract metadata
+
|Archivist moves PDI from accessions folder to SIP in AIP preparation folder
|
 
Extract preservation metadata from content objects in the SIP
 
 
 
'''''Go to:'' - 3.6 Audit submission and select for preservation
 
 
|
 
|
*This step seems to have been accomplished in steps 3.4 and 3.6.
+
*In Archivematica, all the contents relating to the SIP in /home/demo/accessionreports/ is considered PDI (Preservation Description Information).
*In archivematica, the metadata extraction activity takes place at the same time as format identification, as  the NLNZ Metadata extractor tool also performs format identification.  
+
**Cut these contents and paste them to /home/demo/prepareAIP/[SIP].
*What is the necessary metadata that must be extracted at this point?
+
|4.5 Add PDI to AIP
 
|-
 
|-
|'''3.6''' Audit submission and select for preservation ([[UC-4.6]])
+
|Archivist moves SIP to AIP generation folder
 
|
 
|
Based on the results of steps 3.2, 3.3, and 3.4, apply Archives policies and determine which (if any) content objects in the SIP should not be included in the AIP
+
*Drag the SIP from /home/demo/prepareAIP/ and drop it into /home/demo/generateAIP/.
Document which content objects will not be included and why.
+
*Note that you must drag and drop, not copy and paste, in order to trigger the AIP generation process.
 
 
'''''If all SIP content is to be included in the AIP, go to:'' 3.10 - Accept selected SIP components for ingest'''
 
 
 
 
 
'''''If some SIP content is to be excluded from the AIP, and submission agreement requires notifying Producer of appraisal decision, go to:'' 3.7 - Notify Producer of appraisal decision'''
 
*'''''Else, go to:'' 3.9 - Destroy unselected SIP components'''
 
 
 
|
 
Possible reasons for exclusion:
 
*technical
 
**insufficient preservation metadata
 
**unrecognized format
 
**unsupported format
 
**invalid format
 
*appraisal
 
**duplicate content
 
**does file level selection occur at this point (e.g. methodological sampling of records at the file level for selective retention?) or before the SIPs get to this point?
 
**further to the above, this diagram only deals with one sip at a time. How do we manage appraisal decisions that must take into consideration the relationships among many SIPs?
 
 
 
|-
 
 
 
|'''3.7''' Notify Producer about appraisal decision(s)
 
|Provide the Producer with copies of the appraisal report or other documentation as required by the submisssion agreement identifying SIP components to be destroyed.
 
 
 
'''''If the Producer appeals the appraisal decision, go to: ''3.8 - Evaluate appeals'''
 
 
 
'''''Else, go to:'' 3.9 - Destroy unselected SIP components'''
 
 
|
 
|
 
|-
 
|-
|'''3.8''' Evaluate appeals
+
|SIP content and PDI are zipped into AIP
|
 
*Receive Producer's appeals
 
*Evaluate appeals
 
*Based on the evaluation, make any necessary amendments to the appraisal decision
 
*Notify the Producer of the outcome of the appeal process
 
 
 
'''''Go to: 3.9 - Destroy unselected SIP components'''
 
 
|
 
|
 +
*A script called BagIt will run in the background, converting the SIP into a single zipped file. In Archivematica, this zipped file, which also includes metadata generated by the bagging process, constitutes the AIP. For more information about BagIT, see [http://www.digitalpreservation.gov/library/challenge/data-transfer.html http://www.digitalpreservation.gov/library/challenge/data-transfer.html].
 +
|UC-1.3, step 10
 
|-
 
|-
|'''3.9''' - Destroy unselected SIP components
+
|AIP is moved to AIP receipt folder
 
|
 
|
*Destroy all SIP content objects identified for destruction in the appraisal decision (or amended appraisal decision as appropriate).
+
*The bagging process automatically moves the AIP to /home/demo/receiveAIP/.
*Document destruction
+
**To view the AIP, double-click it. When it opens in a separate window, double-click it again; this will allow you to view (but not modify or delete) the contents of the zipped bag.
 
|
 
|
*Note - it is possible that no components have been identified for destruction. In this case, move on to step 3.10
 
*Do we need to destroy accompanying metadata as well?
 
 
|-
 
|-
|'''3.10''' Accept selected SIP components for ingest
+
|}<br />
|
 
|
 
|}
 
 
 
===AD4 Generate AIP===
 
[[File:Archivematica_AD4_GenerateAIP_v2.pdf|Archivematica UML Activity diagram AD4 Generate AIP]]
 
  
 +
===Activity diagram 5 Transfer AIP to Archival Storage===
 +
[[Media:Archivematica_AD5_TransferAIPtoArchivalStorage_v3.pdf|Archivematica UML Activity diagram AD5 Transfer AIP to Archival Storage]]
  
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|-
 
|- style="background-color:#cccccc;"
 
|- style="background-color:#cccccc;"
!style="width:20%"|'''Step'''
+
!style="width:20%"|'''Workflow diagram step'''
!style="width:40%"|'''Implementation'''
+
!style="width:55%"|'''Description'''
!style="width:40%"|'''Notes'''
+
!style="width:25%"|'''Activity diagram references'''
|-
 
|'''4.1''' Create AIP containers
 
|
 
|If 1 SIP = 1 AIP this step is not necessary. It is only necessary if the SIP is being divided into multiple AIPs.
 
|-
 
|'''4.2''' Add Content Information to AIP
 
|
 
|See note for step 4.1, above
 
|-
 
|'''4.3''' Transform Content Information
 
|
 
*In /demo/ingest/2009_01, create a folder entitled '''normalized'''.
 
*Using Xena, normalize the contents of each AIP.
 
**Set destination folder for the normalization to /home/demo/ingest/2009_01/normalized/.
 
**Set the destination folder for the Xena log file to /home/demo/ingest/2009_01/PDI/.
 
|
 
|-
 
|'''4.4''' Add Transformed Content Information to AIP
 
|See step 4.3, above.
 
|
 
|-
 
|'''4.5''' Add PDI to AIP
 
|Create a plain text report containing provenance and other PDI elements (including arrangement information) and save it to the AIP.
 
|
 
 
|-
 
|-
|'''4.6''' Generate Descriptive Information ([[UC-1.4]])
+
|Archivist copies AIP to archival storage folder
|
 
*Obtain available descriptive information from PDI added to AIP, Submission Agreement and/or records schedule, communications with donor, etc.
 
*In Qubit, enter descriptive information at aggregate levels of description (fonds, series, file). Then create item-level descriptions for each object to be uploaded.
 
*Upload digital objects from /home/demo/ingest/2009_01/content/.
 
**In Qubit 1.0.7 batch upload of digital objects is not possible. The objects have to be uploaded 1 at a time.
 
*Add additional descriptive information to item-level descriptions.
 
*For instructions on using Qubit, go to the [http://www.ica-atom.org/docs/index.php?title=User_manual on-line user manual].
 
 
|
 
|
 +
*Copy the AIP from /home/demo/receiveAIP/ to /home/demo/storeAIP/.
 +
|5.2 Transfer AIP to archival storage (UC-1.5)
 
|-
 
|-
|}
+
|}<br />
 
 
===AD5 Transfer AIP to Archival Storage===
 
[[File:Archivematica_AD5_TransferAIPtoArchivalStorage_v2.pdf|Archivematica UML Activity diagram AD5 Transfer AIP to Archival Storage]]
 
  
 
+
Go to [[Archival Storage (0.5)]]
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Step'''
 
!style="width:40%"|'''Implementation'''
 
!style="width:40%"|'''Notes'''
 
|-
 
|'''5.1''' Request storage of AIP ([[UC-1.5]])
 
|
 
|
 
|-
 
|'''5.2''' Transfer AIP to Archival storage ([[UC-1.5]])
 
|
 
*Right click on folder /home/demo/ingest/2009_01.
 
*Click ''Scripts > BagIt''.
 
*A bag will be created and stored automatically in /home/demo/mybags/.
 
*Copy AIP to /home/ingest/archivalstorage/.
 
|
 
*In archivematica 0.4 maybe configure BagIt script to drop the bag directly into /home/demo/ingest/archivalstorage/.
 
*The BagIt script can also be edited manually if desired: open demo/.gnome2/nautilus-scripts/Bagit and change "mybags" to "archivalstorage".
 
|-
 
|'''5.3''' Confirm receipt and storage of AIP ([[UC-1.5]])
 
|
 
|
 
|-
 
|'''5.4''' Add AIP storage location to descriptive information ([[UC-1.6]])
 
|In the ''physical storage'' area in Qubit, add the storage location.
 
|
 
|-
 
|'''5.5''' Add Descriptive Information to Data Management ([[UC-1.6]])
 
|This was done in AD4, step 4.6. In OAIS, generating descriptive information and adding them to data management are two different steps; however, in Archivematica this is done in one step in Qubit, which is used to upload images to a web interface, generate derivatives for searching and browsing and record descriptive information.
 
|
 
|-
 
|'''5.6''' Confirm update of Data Management
 
|
 
|
 
|-
 
|'''5.7''' Destroy SIP and AIP copies
 
|Destroy /home/demo/ingest/2009_01 and /home/demo/mybags/2009_01.
 
|
 
|-
 
|}
 

Latest revision as of 17:21, 2 August 2012

Main Page > Documentation > Release 0.5 Documentation > Ingest (0.5)


Setting up shared folders[edit]

In order to work through all of the steps in the tables below, you will need to set up two shared folders in Archivematica.

  • The purpose of shared folders is to allow you to place digital objects into a folder on your host machine and have the objects automatically appear in a folder in Archivematica, and vice versa.
  • The two folders in Archivematica which need to be set up as shared folders are /home/demo/ingestSIP and /home/demo/storeAIP.
    • /home/demo/ingestSIP is used to ingest SIPs from the host machine into Archivematica.
    • /home/demo/storeAIP is used to drop AIPs into a folder in Archivematica and have them appear back in the host machine.
  • Recommended names for the folders on the home machine are sendSIP and archivalstorage.
  • For instructions on setting up shared folders, please go to Virtual appliance instructions.
  • For testing purposes you can avoid setting up shared folders and simply use the test files found in /home/demo/testFiles/. However, you will not be moving SIPs into Archivematica or moving stored AIPs out of it.


Activity diagram 1 Receive SIP[edit]

Archivematica UML Activity diagram AD1 Receive SIP

Workflow diagram step Description Activity diagram references
Producer places SIP in shared folder on host machine
  • Place a folder of digital files into the shared ingest folder on the host machine.
  • Note that the SIP does not need to be prepared in any way prior to ingest - i.e. you do not need to prepare it as a METS file or otherwise process the SIP. A simple folder with one or more files in it is fine.
SIP appears in shared folder in Archivematica
  • SIP will appear in /home/demo/ingestSIP/.
  • To navigate to this folder, click Places > Home folder.
1.4 Receive SIP from Producer (UC-1.1)
Archivist copies SIP from shared folder to SIP receipt folder
  • Copy SIP from /home/demo/ingestSIP/ to /home/demo/receiveSIP/.
  • /home/demo/ingestSIP acts as a backup SIP copy. If anything goes wrong during the ingest process, this backup copy can be retrieved and processed.


Activity diagram 2 Audit SIP[edit]

Archivematica UML Activity diagram AD2 Audit SIP

Workflow diagram step Description Activity diagram references
Archivist moves SIP from SIP receipt folder into quarantine
  • Drag the SIP from /home/demo/receiveSIP/ and drop it into /home/demo/quarantine.
  • Note that you must drag and drop, not copy and paste, in order to trigger the quarantine process.
2.1 Quarantine SIP
SIP is quarantined for 2 minutes
  • In a production system, SIPs would normally be quarantined for a set period of time (for example, four weeks), to allow anti-virus software to be updated with the latest virus profiles.
  • A lock should appear on the SIP folder in quarantine. The archivist will not be able to read or modify the files during this time.
2.1 Quarantine SIP
SIP is scanned for malware
  • At the end of the quarantine period, ClamAV will automatically scan the files for viruses and other malware.
2.2 Check SIP for malware
Infected files are sent to possiblevirii folder
  • Infected files will appear in /home/demo/possiblevirii/. If this occurs, do not take any further steps in the ingest process. Inform the Producer that infected files have been found. It is recommended at this point to delete all SIP copies and request that the Producer take steps to review the causes of the problem and eventually resubmit a malware-free SIP.

2.4 Audit SIP for compliance
2.5 Assess SIP defiencies
2.6 Notify Producer of SIP rejection
2.8 Destroy SIP copies

Virus-checker report is sent to accessions folder
  • A report on ClamAV's virus scan will appear automatically: home/demo/accessionreports/virus.log.
2.4 Audit SIP for compliance


Activity diagram 3 Accept SIP for Ingest[edit]

Archivematica UML Activity diagram AD3 Accept SIP for Ingest

Workflow diagram step Description Activity diagram references
SIP contents are identified and validated using FITS
  • FITS (File Information Tool Set) is automatically launched once the quarantine period has ended and the files have been scanned for viruses.
  • FITS incorporates format identification and validation tools such as DROID. JHOVE and the New Zealand Metadata Extractor, comparing the results of each tool and extracting a set of identification, validation and technical metadata. For more information on the FITS tool, see http://code.google.com/p/fits/

3.3 Identify formats (UC-1.2, step 3)
3.4 Validate formats (UC-1.2, step 3)
3.5 Extract metadata (UC-1.2, step 3)

Identification/validation reports are sent to accessions folder
  • The FITS report will appear in /home/demo/accessionreports/. The report appears as a folder with a 10-digit number; inside the folder is a report for each file in the SIP.
  • Note that each report contains an MD5 checksum for the file.

3.3 Identify formats (UC-1.2, step 3)
3.4 Validate formats (UC-1.2, step 3)
3.5 Extract metadata (UC-1.2, step 3)

Accession log is sent to accessions folder
  • A report on the accession process will appear automatically: home/demo/accessionreports/accession.log.
  • For each file in the SIP, the accession log will state "Accession of /tmp/accession-[FITS folder number]/[SIP number]/filename] completed successfully."


Activity diagram 4 Generate AIP[edit]

Archivematica UML Activity diagram AD4 Generate AIP

Workflow diagram step Description Activity diagram references
SIP is moved to AIP preparation folder
  • At the end of the quarantine process, Archivematica automatically drops the SIP into /home/demo/prepareAIP
4.2 Add content information to AIP
Archivist normalizes files
  • From Archivematica's Linux desktop, open Xena
  • Click Add Directory
  • Select /home/demo/prepareAIP/[SIP]/
  • In Tools > Xena 4.2.1 Preferences > Xena destination directory enter /home/demo/prepareAIP/[SIP].
  • In Tools > Xena 4.2.1 Preferences > Xena log file enter /home/demo/accessionreports/xena_log.
  • Click OK to close Xena 4.2.1 Preferences
  • Click Normalise
  • Wait for normalization process to be completed (a pop-up dialogue box will open indicating that the process has been completed).
  • Click OK to close pop-up window
  • Close Xena
4.3 Transform content information (UC-1.3, step 9)
Normalized files are saved to AIP preparation folder
  • In the SIP, look for files with the extension .xena. These are normalized versions of the original files.
  • To view representations of normalized files, open the Xena Viewer from Archivematica's Linux desktop.
4.3 Transform content information (UC-1.3, step 9)
Normalization log is saved to accessions folder
  • A log file showing all the actions taken by Xena will appear: /home/demo/accessionreports/xena_log.0.
Archivist moves PDI from accessions folder to SIP in AIP preparation folder
  • In Archivematica, all the contents relating to the SIP in /home/demo/accessionreports/ is considered PDI (Preservation Description Information).
    • Cut these contents and paste them to /home/demo/prepareAIP/[SIP].
4.5 Add PDI to AIP
Archivist moves SIP to AIP generation folder
  • Drag the SIP from /home/demo/prepareAIP/ and drop it into /home/demo/generateAIP/.
  • Note that you must drag and drop, not copy and paste, in order to trigger the AIP generation process.
SIP content and PDI are zipped into AIP UC-1.3, step 10
AIP is moved to AIP receipt folder
  • The bagging process automatically moves the AIP to /home/demo/receiveAIP/.
    • To view the AIP, double-click it. When it opens in a separate window, double-click it again; this will allow you to view (but not modify or delete) the contents of the zipped bag.


Activity diagram 5 Transfer AIP to Archival Storage[edit]

Archivematica UML Activity diagram AD5 Transfer AIP to Archival Storage

Workflow diagram step Description Activity diagram references
Archivist copies AIP to archival storage folder
  • Copy the AIP from /home/demo/receiveAIP/ to /home/demo/storeAIP/.
5.2 Transfer AIP to archival storage (UC-1.5)


Go to Archival Storage (0.5)