Difference between revisions of "Bag ingest"

From Archivematica
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Bag ingest
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Bag ingest
  
Archivematica should be able to accept transfers packaged in accordance with the Bagit specification.
+
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
Requirements:
+
==Feature description==
*All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests. *The Bagit checks should generate log files that will be added to the ''Logs'' directory of the transfer.  
+
Archivematica accepts transfers packaged in accordance with the Bagit specification.
*The incoming Bagit structure will not be maintained but instead will be replaced during the prepareAIP micro-service.
+
</br>
 +
==Requirements==
 +
*All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.  
 +
*Archivematica differentiates between mandatory and optional bag elements so that if optional elements are not present the bag does not fail the verification micro-service.
 +
*The BagIt checks generate log files that will be added to the ''logs'' directory of the transfer.
 +
*The BagIt file manifest (manifest-sha512.txt) is placed in the '' metadata'' directory of the transfer.
 +
*The other BagIt files (bag-info.txt, bagit.txt, tagmanifest-md5.txt) will be placed in a ''logs/BagIt'' directory.
 +
*No new PREMIS events are required. The BagIt checks are recorded as a fixity check in PREMIS.
 +
</br>
  
Workflow:
+
==Workflow==
 +
In this workflow diagram, the white ovals are manual steps and the grey ovals are automated steps.
 +
[[File:BagIt.png|680px|thumb|center|]]
 +
</br>
  
 +
==Parse and index contents of bag-info.txt==
 +
*Enhancements being developed in 2015
 +
 +
===Parse bag-info.txt contents to AIP METS file===
 +
*Labels in bag-info.txt file serialized as XML in METS sourceMD, linked to the objects directory of the AIP
 +
*Sample bag-info.txt (from [https://tools.ietf.org/html/draft-kunze-bagit-10 https://tools.ietf.org/html/draft-kunze-bagit-10]:
 +
 +
<pre>Source-Organization: Spengler University
 +
Organization-Address: 1400 Elm St., Cupertino, California, 95014
 +
Contact-Name: Edna Janssen
 +
Contact-Phone: +1 408-555-1212
 +
Contact-Email: ej@spengler.edu
 +
External-Description: Uncompressed greyscale TIFF images from the Yoshimuri papers colle...
 +
Bagging-Date: 2008-01-15
 +
External-Identifier: spengler_yoshimuri_001
 +
Bag-Size: 260 GB
 +
Payload-Oxum: 279164409832.1198
 +
Bag-Group-Identifier: spengler_yoshimuri
 +
Bag-Count: 1 of 15
 +
Internal-Sender-Identifier: /storage/images/yoshimuri
 +
Internal-Sender-Description: Uncompressed greyscale TIFFs created from microfilm and are...</pre>
 +
 +
*Sample AIP METS file result:
 +
 +
<pre><mets:amdSec ID="amdSec_14">
 +
  <mets:sourceMD ID="sourceMD_1">
 +
    <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="BagIt">
 +
      <mets:xmlData>
 +
        <transfer_metadata>
 +
          <Source-Organization>Spengler University</Source-Organization>
 +
          <Organization-Address>1400 Elm St., Cupertino, California, 95014</Organization-Address>
 +
          <Contact-Name>Edna Janssen</Contact-Name>
 +
          <Contact-Phone>+1 408-555-1212</Contact-Phone>
 +
          <Contact-Email>ej@spengler.edu</Contact-Email>
 +
          <External-Description> Uncompressed greyscale TIFF images from the Yoshimuri papers colle...</External-Description>
 +
          <Bagging-Date>2008-01-15</Bagging-Date>
 +
          <External-Identifier>spengler_yoshimuri_001</External-Identifier>
 +
          <Bag-Size>260 GB</Bag-Size>
 +
          <Payload-Oxum>279164409832.1198</Payload-Oxum>
 +
          <Bag-Group-Identifier>spengler_yoshimuri</Bag-Group-Identifier>
 +
          <Bag-Count>1 of 15</Bag-Count>
 +
          <Internal-Sender-Identifier>/storage/images/yoshimuri</Internal-Sender-Identifier>
 +
          <Internal-Sender-Description>Uncompressed greyscale TIFFs created from microfilm and are...</Internal-Sender-Description>
 +
        </transfer_metadata>
 +
      </mets:xmlData>
 +
    </mets:mdWrap>
 +
  </mets:sourceMD>
 +
</mets:amdSec></pre>
 +
*When Bagit labels contain characters that are not valid XML labels, continue processing but print error message and skip labels with invalid content.
 +
</br>
 +
 +
===Search contents in archival storage tab===
 +
*Add keyword field "Transfer metadata" to drop-down menu in search. This will search all the contents of the <transfer_metadata> container in the METS file (as indexed in ElasticSearch).
 +
*Add keyword field "Transfer metadata (other)" to drop-down menu in search. This will allow users to search individual fields in the <transfer_metadata> container.
 +
**When the user selects "Transfer metadata (other)" a separate box will appear which will allow the user to enter the label of the specific field to be searched.
 +
*Add ability to search date ranges.
 +
**To search on a date range in <transfer_metadata> or one if its sub-fields, the user enters two dates in ISO date format separated by a colon. For example, "2015-01-03:2015-04-14".
  
  
 
[[Category: Development documentation]]
 
[[Category: Development documentation]]

Latest revision as of 15:41, 11 February 2020

Main Page > Development > Development documentation > Bag ingest

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

Feature description[edit]

Archivematica accepts transfers packaged in accordance with the Bagit specification.

Requirements[edit]

  • All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.
  • Archivematica differentiates between mandatory and optional bag elements so that if optional elements are not present the bag does not fail the verification micro-service.
  • The BagIt checks generate log files that will be added to the logs directory of the transfer.
  • The BagIt file manifest (manifest-sha512.txt) is placed in the metadata directory of the transfer.
  • The other BagIt files (bag-info.txt, bagit.txt, tagmanifest-md5.txt) will be placed in a logs/BagIt directory.
  • No new PREMIS events are required. The BagIt checks are recorded as a fixity check in PREMIS.


Workflow[edit]

In this workflow diagram, the white ovals are manual steps and the grey ovals are automated steps.

BagIt.png


Parse and index contents of bag-info.txt[edit]

  • Enhancements being developed in 2015

Parse bag-info.txt contents to AIP METS file[edit]

Source-Organization: Spengler University
Organization-Address: 1400 Elm St., Cupertino, California, 95014
Contact-Name: Edna Janssen
Contact-Phone: +1 408-555-1212
Contact-Email: ej@spengler.edu
External-Description: Uncompressed greyscale TIFF images from the Yoshimuri papers colle...
Bagging-Date: 2008-01-15
External-Identifier: spengler_yoshimuri_001
Bag-Size: 260 GB
Payload-Oxum: 279164409832.1198
Bag-Group-Identifier: spengler_yoshimuri
Bag-Count: 1 of 15
Internal-Sender-Identifier: /storage/images/yoshimuri
Internal-Sender-Description: Uncompressed greyscale TIFFs created from microfilm and are...
  • Sample AIP METS file result:
<mets:amdSec ID="amdSec_14">
  <mets:sourceMD ID="sourceMD_1">
    <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="BagIt">
      <mets:xmlData>
        <transfer_metadata>
          <Source-Organization>Spengler University</Source-Organization>
          <Organization-Address>1400 Elm St., Cupertino, California, 95014</Organization-Address>
          <Contact-Name>Edna Janssen</Contact-Name>
          <Contact-Phone>+1 408-555-1212</Contact-Phone>
          <Contact-Email>ej@spengler.edu</Contact-Email>
          <External-Description> Uncompressed greyscale TIFF images from the Yoshimuri papers colle...</External-Description>
          <Bagging-Date>2008-01-15</Bagging-Date>
          <External-Identifier>spengler_yoshimuri_001</External-Identifier>
          <Bag-Size>260 GB</Bag-Size>
          <Payload-Oxum>279164409832.1198</Payload-Oxum>
          <Bag-Group-Identifier>spengler_yoshimuri</Bag-Group-Identifier>
          <Bag-Count>1 of 15</Bag-Count>
          <Internal-Sender-Identifier>/storage/images/yoshimuri</Internal-Sender-Identifier>
          <Internal-Sender-Description>Uncompressed greyscale TIFFs created from microfilm and are...</Internal-Sender-Description>
        </transfer_metadata>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:sourceMD>
</mets:amdSec>
  • When Bagit labels contain characters that are not valid XML labels, continue processing but print error message and skip labels with invalid content.


Search contents in archival storage tab[edit]

  • Add keyword field "Transfer metadata" to drop-down menu in search. This will search all the contents of the <transfer_metadata> container in the METS file (as indexed in ElasticSearch).
  • Add keyword field "Transfer metadata (other)" to drop-down menu in search. This will allow users to search individual fields in the <transfer_metadata> container.
    • When the user selects "Transfer metadata (other)" a separate box will appear which will allow the user to enter the label of the specific field to be searched.
  • Add ability to search date ranges.
    • To search on a date range in <transfer_metadata> or one if its sub-fields, the user enters two dates in ISO date format separated by a colon. For example, "2015-01-03:2015-04-14".