Difference between revisions of "Bag ingest"

From Archivematica
Jump to navigation Jump to search
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Bag ingest
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Bag ingest
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
==Feature description==
 
==Feature description==
 
Archivematica accepts transfers packaged in accordance with the Bagit specification.
 
Archivematica accepts transfers packaged in accordance with the Bagit specification.
 
+
</br>
 
==Requirements==
 
==Requirements==
 
*All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.  
 
*All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.  
*The BagIt checks should generate log files that will be added to the ''Logs'' directory of the transfer.  
+
*Archivematica differentiates between mandatory and optional bag elements so that if optional elements are not present the bag does not fail the verification micro-service.
*The incoming BagIt structure will not be maintained.
+
*The BagIt checks generate log files that will be added to the ''logs'' directory of the transfer.
*No new PREMIS events are required. The BagIt checks will be recorded as a fixity check in PREMIS.
+
*The BagIt file manifest (manifest-sha512.txt) is placed in the '' metadata'' directory of the transfer.
 +
*The other BagIt files (bag-info.txt, bagit.txt, tagmanifest-md5.txt) will be placed in a ''logs/BagIt'' directory.
 +
*No new PREMIS events are required. The BagIt checks are recorded as a fixity check in PREMIS.
 +
</br>
  
 
==Workflow==
 
==Workflow==
 
In this workflow diagram, the white ovals are manual steps and the grey ovals are automated steps.
 
In this workflow diagram, the white ovals are manual steps and the grey ovals are automated steps.
 
[[File:BagIt.png|680px|thumb|center|]]
 
[[File:BagIt.png|680px|thumb|center|]]
 +
</br>
 +
 +
==Parse and index contents of bag-info.txt==
 +
*Enhancements being developed in 2015
 +
 +
===Parse bag-info.txt contents to AIP METS file===
 +
*Labels in bag-info.txt file serialized as XML in METS sourceMD, linked to the objects directory of the AIP
 +
*Sample bag-info.txt (from [https://tools.ietf.org/html/draft-kunze-bagit-10 https://tools.ietf.org/html/draft-kunze-bagit-10]:
 +
 +
<pre>Source-Organization: Spengler University
 +
Organization-Address: 1400 Elm St., Cupertino, California, 95014
 +
Contact-Name: Edna Janssen
 +
Contact-Phone: +1 408-555-1212
 +
Contact-Email: ej@spengler.edu
 +
External-Description: Uncompressed greyscale TIFF images from the Yoshimuri papers colle...
 +
Bagging-Date: 2008-01-15
 +
External-Identifier: spengler_yoshimuri_001
 +
Bag-Size: 260 GB
 +
Payload-Oxum: 279164409832.1198
 +
Bag-Group-Identifier: spengler_yoshimuri
 +
Bag-Count: 1 of 15
 +
Internal-Sender-Identifier: /storage/images/yoshimuri
 +
Internal-Sender-Description: Uncompressed greyscale TIFFs created from microfilm and are...</pre>
 +
 +
*Sample AIP METS file result:
 +
 +
<pre><mets:amdSec ID="amdSec_14">
 +
  <mets:sourceMD ID="sourceMD_1">
 +
    <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="BagIt">
 +
      <mets:xmlData>
 +
        <transfer_metadata>
 +
          <Source-Organization>Spengler University</Source-Organization>
 +
          <Organization-Address>1400 Elm St., Cupertino, California, 95014</Organization-Address>
 +
          <Contact-Name>Edna Janssen</Contact-Name>
 +
          <Contact-Phone>+1 408-555-1212</Contact-Phone>
 +
          <Contact-Email>ej@spengler.edu</Contact-Email>
 +
          <External-Description> Uncompressed greyscale TIFF images from the Yoshimuri papers colle...</External-Description>
 +
          <Bagging-Date>2008-01-15</Bagging-Date>
 +
          <External-Identifier>spengler_yoshimuri_001</External-Identifier>
 +
          <Bag-Size>260 GB</Bag-Size>
 +
          <Payload-Oxum>279164409832.1198</Payload-Oxum>
 +
          <Bag-Group-Identifier>spengler_yoshimuri</Bag-Group-Identifier>
 +
          <Bag-Count>1 of 15</Bag-Count>
 +
          <Internal-Sender-Identifier>/storage/images/yoshimuri</Internal-Sender-Identifier>
 +
          <Internal-Sender-Description>Uncompressed greyscale TIFFs created from microfilm and are...</Internal-Sender-Description>
 +
        </transfer_metadata>
 +
      </mets:xmlData>
 +
    </mets:mdWrap>
 +
  </mets:sourceMD>
 +
</mets:amdSec></pre>
 +
*When Bagit labels contain characters that are not valid XML labels, continue processing but print error message and skip labels with invalid content.
 +
</br>
  
 +
===Search contents in archival storage tab===
 +
*Add keyword field "Transfer metadata" to drop-down menu in search. This will search all the contents of the <transfer_metadata> container in the METS file (as indexed in ElasticSearch).
 +
*Add keyword field "Transfer metadata (other)" to drop-down menu in search. This will allow users to search individual fields in the <transfer_metadata> container.
 +
**When the user selects "Transfer metadata (other)" a separate box will appear which will allow the user to enter the label of the specific field to be searched.
 +
*Add ability to search date ranges.
 +
**To search on a date range in <transfer_metadata> or one if its sub-fields, the user enters two dates in ISO date format separated by a colon. For example, "2015-01-03:2015-04-14".
  
  
 
[[Category: Development documentation]]
 
[[Category: Development documentation]]

Latest revision as of 16:41, 11 February 2020

Main Page > Development > Development documentation > Bag ingest

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

Feature description[edit]

Archivematica accepts transfers packaged in accordance with the Bagit specification.

Requirements[edit]

  • All standard Bagit checks are run: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.
  • Archivematica differentiates between mandatory and optional bag elements so that if optional elements are not present the bag does not fail the verification micro-service.
  • The BagIt checks generate log files that will be added to the logs directory of the transfer.
  • The BagIt file manifest (manifest-sha512.txt) is placed in the metadata directory of the transfer.
  • The other BagIt files (bag-info.txt, bagit.txt, tagmanifest-md5.txt) will be placed in a logs/BagIt directory.
  • No new PREMIS events are required. The BagIt checks are recorded as a fixity check in PREMIS.


Workflow[edit]

In this workflow diagram, the white ovals are manual steps and the grey ovals are automated steps.

BagIt.png


Parse and index contents of bag-info.txt[edit]

  • Enhancements being developed in 2015

Parse bag-info.txt contents to AIP METS file[edit]

Source-Organization: Spengler University
Organization-Address: 1400 Elm St., Cupertino, California, 95014
Contact-Name: Edna Janssen
Contact-Phone: +1 408-555-1212
Contact-Email: ej@spengler.edu
External-Description: Uncompressed greyscale TIFF images from the Yoshimuri papers colle...
Bagging-Date: 2008-01-15
External-Identifier: spengler_yoshimuri_001
Bag-Size: 260 GB
Payload-Oxum: 279164409832.1198
Bag-Group-Identifier: spengler_yoshimuri
Bag-Count: 1 of 15
Internal-Sender-Identifier: /storage/images/yoshimuri
Internal-Sender-Description: Uncompressed greyscale TIFFs created from microfilm and are...
  • Sample AIP METS file result:
<mets:amdSec ID="amdSec_14">
  <mets:sourceMD ID="sourceMD_1">
    <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="BagIt">
      <mets:xmlData>
        <transfer_metadata>
          <Source-Organization>Spengler University</Source-Organization>
          <Organization-Address>1400 Elm St., Cupertino, California, 95014</Organization-Address>
          <Contact-Name>Edna Janssen</Contact-Name>
          <Contact-Phone>+1 408-555-1212</Contact-Phone>
          <Contact-Email>ej@spengler.edu</Contact-Email>
          <External-Description> Uncompressed greyscale TIFF images from the Yoshimuri papers colle...</External-Description>
          <Bagging-Date>2008-01-15</Bagging-Date>
          <External-Identifier>spengler_yoshimuri_001</External-Identifier>
          <Bag-Size>260 GB</Bag-Size>
          <Payload-Oxum>279164409832.1198</Payload-Oxum>
          <Bag-Group-Identifier>spengler_yoshimuri</Bag-Group-Identifier>
          <Bag-Count>1 of 15</Bag-Count>
          <Internal-Sender-Identifier>/storage/images/yoshimuri</Internal-Sender-Identifier>
          <Internal-Sender-Description>Uncompressed greyscale TIFFs created from microfilm and are...</Internal-Sender-Description>
        </transfer_metadata>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:sourceMD>
</mets:amdSec>
  • When Bagit labels contain characters that are not valid XML labels, continue processing but print error message and skip labels with invalid content.


Search contents in archival storage tab[edit]

  • Add keyword field "Transfer metadata" to drop-down menu in search. This will search all the contents of the <transfer_metadata> container in the METS file (as indexed in ElasticSearch).
  • Add keyword field "Transfer metadata (other)" to drop-down menu in search. This will allow users to search individual fields in the <transfer_metadata> container.
    • When the user selects "Transfer metadata (other)" a separate box will appear which will allow the user to enter the label of the specific field to be searched.
  • Add ability to search date ranges.
    • To search on a date range in <transfer_metadata> or one if its sub-fields, the user enters two dates in ISO date format separated by a colon. For example, "2015-01-03:2015-04-14".