Difference between revisions of "Improvements/aipreadme"

From Archivematica
Jump to navigation Jump to search
Line 41: Line 41:
 
<p>AIP = Archival Information Package</p>
 
<p>AIP = Archival Information Package</p>
 
<p>METS = Metadata Encoding and Transmission Standard</p>
 
<p>METS = Metadata Encoding and Transmission Standard</p>
 +
<p>OAIS = Open Archival Information System</p>
 
<p>PDI = Preservation Description Information</p>
 
<p>PDI = Preservation Description Information</p>
 
<p>PREMIS = Preservation Metadata Implementation Strategies</p>
 
<p>PREMIS = Preservation Metadata Implementation Strategies</p>
<p>OAIS = Open Archival Information System</p>
 
 
<p>UUID = Unique Universal Identifier</p>
 
<p>UUID = Unique Universal Identifier</p>
 
   
 
   
Line 52: Line 52:
 
Content Information
 
Content Information
 
   
 
   
In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelemet.ai the preservation version may be named ''BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf''.
+
In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named ''BBhelemet.ai'' the preservation version may be named ''BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf''.
  
 
The originally ingested digital objects and any preservation versions are located in the ''objects'' directory of the AIP. There will be nested subdirectories in the ''object'' directory if these subdirectories were included in the original transfer or added during SIP arrangement. The ''objects'' directory also includes a ''submissiondocumentation'' folder and a ''metadata'' folder. The ''submissiondocumentation'' folder contains documentation such as donor agreements and transfer forms, if included the original transfer, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The ''metadata'' folder holds any metadata files included in the original transfer, and any OCR text files generated during processing.
 
The originally ingested digital objects and any preservation versions are located in the ''objects'' directory of the AIP. There will be nested subdirectories in the ''object'' directory if these subdirectories were included in the original transfer or added during SIP arrangement. The ''objects'' directory also includes a ''submissiondocumentation'' folder and a ''metadata'' folder. The ''submissiondocumentation'' folder contains documentation such as donor agreements and transfer forms, if included the original transfer, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The ''metadata'' folder holds any metadata files included in the original transfer, and any OCR text files generated during processing.
Line 65: Line 65:
 
<p>-structMap (structural map): a physical or logical ordering of the digital objects.</p>
 
<p>-structMap (structural map): a physical or logical ordering of the digital objects.</p>
  
The technical and provenance information in the METS amdSec is recorded as PREMIS metadata....
+
The technical and provenance information in the METS amdSec is recorded as PREMIS metadata. PREMIS is also a Library of Congress standard, and is described as "the international standard for metadata to support the preservation of digital objects and ensure their long-term usability." The PREMIS entities are wrapped in the METS file as follows:
 +
 
 +
<p>amdSec</p>
 +
<p>--mets:techMD (technical metadata)</p>
 +
PREMIS Object (e.g. UUID, size, checksum, format, original name, extracted technical metadata)</p>
 +
<p>--mets:digiprovMD (digital provenance metadata)</p>
 +
PREMIS Event (e.g. ingestion, message digest calculation, virus scan, format identification, validation, normalization, fixity check)</p>
 +
PREMIS Agent for each PREMIS Event there are three Agents: the organization, the digital preservation system (e.g. Archivematica 1.x) and the logged-in user</p>
 +
<p>--mets:rightsMD (rights metadata)</p>
 +
PREMIS Rights (only included if the user added rights metadata prior to or during ingest)</p>
  
 
=== Use case: Create and Use a Bag Profile ===
 
=== Use case: Create and Use a Bag Profile ===

Revision as of 11:17, 19 June 2017

User story

As a repository manager, I would like AIP's to be as self describing as possible, so that future users, with little or no information about Archivematica or what an AIP is, will be able to understand the structure and contents of the AIP's I produce now.

Status

2017-06-01 - New Proposal

Interest

If you'd like to get involved in this development, please feel free to contribute to this wiki page or start a discussion on our user forum.

Analysis:

Currently, Archivematica AIP's are structured as a Bag (https://tools.ietf.org/html/draft-kunze-bagit-14) and contain a METS file, which describes the contents of the AIP. Details about the Archivematica AIP structure are here: https://www.archivematica.org/en/docs/archivematica-1.6/user-manual/archival-storage/aip-structure/

METS files are machine readable, but are not human friendly formats.

Adding a human readable index or description into an AIP would improve the chances of a future user understanding the structure.

Archivematica structures AIP's in a specific way, but that is not documented within the AIP. Adding more explicit documentation about the structure would help users test that AIP's are valid, and help them to understand the structure.

There is a similar proposal outlined here: https://github.com/UTS-eResearch/datacrate

Use case: Add a README to each AIP

In the data/ directory (beside the mets file) add a README.html or README.md file. This would be intended as the first file to be opened by a human being trying to examine an AIP.

The README file would include

  • some boilerplate text, describing what an AIP is
  • links to the Archivematica documentation, to METS documentation, to PREMIS docs, etc.
  • a link to the METS file
  • optionally a link to a CATALOG.html file, that includes more detailed information about the contents of the AIP.

Sample README file text

This readme file describes the basic structure of an AIP generated by Archivematica.

Acronyms

AIP = Archival Information Package

METS = Metadata Encoding and Transmission Standard

OAIS = Open Archival Information System

PDI = Preservation Description Information

PREMIS = Preservation Metadata Implementation Strategies

UUID = Unique Universal Identifier

What is Archivematica?

Archivematica is an open-source suite of tools designed to ingest diverse digital content and prepare AIPs for long-term storage. Once an AIP is generated it is not dependent on Archivematica for retrieval, and can be opened using any standard file browser. The concept of an AIP is derived from the ISO 14721:2012 Reference Model for an Open Archival Information System (OAIS), which defines it as “[a]n Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS.”

Content Information

In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelemet.ai the preservation version may be named BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf.

The originally ingested digital objects and any preservation versions are located in the objects directory of the AIP. There will be nested subdirectories in the object directory if these subdirectories were included in the original transfer or added during SIP arrangement. The objects directory also includes a submissiondocumentation folder and a metadata folder. The submissiondocumentation folder contains documentation such as donor agreements and transfer forms, if included the original transfer, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The metadata folder holds any metadata files included in the original transfer, and any OCR text files generated during processing.

Preservation Description Information (PDI)

The PDI in an Archivematica AIP is recorded in a METS XML file. METS is maintained by the Library of Congress, which defines it as “a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium.” In the Archivematica AIP the METS filename is composed of the name METS with a UUID file extension and an XML file extension; for example METS.0ad8cdab-dbbf-4863-8a4d-9a675c227216.xml. The METS file typically consists of the following standard METS sections:

-metsHdr (METS header): basic information about the METS file;

-dmdSec (descriptive metadata section): descriptive metadata about the digital objects;

-amdSec (administrative metadata section): technical and provenance information about the digital objects;

-fileSec (file section): a list of the digital objects and an indication of their role in the AIP (original, preservation, metadata, submission documentation, license etc.);

-structMap (structural map): a physical or logical ordering of the digital objects.

The technical and provenance information in the METS amdSec is recorded as PREMIS metadata. PREMIS is also a Library of Congress standard, and is described as "the international standard for metadata to support the preservation of digital objects and ensure their long-term usability." The PREMIS entities are wrapped in the METS file as follows:

amdSec

--mets:techMD (technical metadata)

PREMIS Object (e.g. UUID, size, checksum, format, original name, extracted technical metadata)

--mets:digiprovMD (digital provenance metadata)

PREMIS Event (e.g. ingestion, message digest calculation, virus scan, format identification, validation, normalization, fixity check)

PREMIS Agent for each PREMIS Event there are three Agents: the organization, the digital preservation system (e.g. Archivematica 1.x) and the logged-in user

--mets:rightsMD (rights metadata)

PREMIS Rights (only included if the user added rights metadata prior to or during ingest)

Use case: Create and Use a Bag Profile

https://github.com/ruebot/bagit-profiles

Archivematica could define a bag profile and reference this in the AIP's it produces. This would help make AIP's more easily machine readable.