Difference between revisions of "Improvements/aipreadme"
Line 54: | Line 54: | ||
In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelemet.ai the preservation version may be named ''BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf''. | In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelemet.ai the preservation version may be named ''BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf''. | ||
− | The originally ingested digital objects and any preservation versions are located in the ''objects'' directory of the AIP. There will be nested subdirectories in the ''object'' | + | The originally ingested digital objects and any preservation versions are located in the ''objects'' directory of the AIP. There will be nested subdirectories in the ''object'' directory if these subdirectories were included in the original transfer or added during SIP arrangement. The objects directory also includes a ''submissiondocumentatio'' folder and a ''metadata'' folder. The ''submissiondocumentation'' folder contains documentation such as donor agreements and transfer forms, if included the original transfer, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The ''objects'' directory may also contain a metadata folder, which holds any metadata files included in the original transfer, and any OCR text files generated during processing. |
Preservation Description Information (PDI) | Preservation Description Information (PDI) |
Revision as of 11:21, 18 June 2017
User story
As a repository manager, I would like AIP's to be as self describing as possible, so that future users, with little or no information about Archivematica or what an AIP is, will be able to understand the structure and contents of the AIP's I produce now.
Status
2017-06-01 - New Proposal
Interest
If you'd like to get involved in this development, please feel free to contribute to this wiki page or start a discussion on our user forum.
Analysis:
Currently, Archivematica AIP's are structured as a Bag (https://tools.ietf.org/html/draft-kunze-bagit-14) and contain a METS file, which describes the contents of the AIP. Details about the Archivematica AIP structure are here: https://www.archivematica.org/en/docs/archivematica-1.6/user-manual/archival-storage/aip-structure/
METS files are machine readable, but are not human friendly formats.
Adding a human readable index or description into an AIP would improve the chances of a future user understanding the structure.
Archivematica structures AIP's in a specific way, but that is not documented within the AIP. Adding more explicit documentation about the structure would help users test that AIP's are valid, and help them to understand the structure.
There is a similar proposal outlined here: https://github.com/UTS-eResearch/datacrate
Use case: Add a README to each AIP
In the data/ directory (beside the mets file) add a README.html or README.md file. This would be intended as the first file to be opened by a human being trying to examine an AIP.
The README file would include
- some boilerplate text, describing what an AIP is
- links to the Archivematica documentation, to METS documentation, to PREMIS docs, etc.
- a link to the METS file
- optionally a link to a CATALOG.html file, that includes more detailed information about the contents of the AIP.
Sample README file text
This readme file describes the basic structure of an AIP generated by Archivematica.
Acronyms
AIP = Archival Information Package
METS = Metadata Encoding and Transmission Standard
PDI = Preservation Description Information
PREMIS = Preservation Metadata Implementation Strategies
OAIS = Open Archival Information System
UUID = Unique Universal Identifier
What is Archivematica?
Archivematica is an open-source suite of tools designed to ingest diverse digital content and prepare AIPs for long-term storage. Once an AIP is generated it is not dependent on Archivematica for retrieval, and can be opened using any standard file browser. The concept of an AIP is derived from the ISO 14721:2012 Reference Model for an Open Archival Information System (OAIS), which defines it as “[a]n Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS.”
Content Information
In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelemet.ai the preservation version may be named BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf.
The originally ingested digital objects and any preservation versions are located in the objects directory of the AIP. There will be nested subdirectories in the object directory if these subdirectories were included in the original transfer or added during SIP arrangement. The objects directory also includes a submissiondocumentatio folder and a metadata folder. The submissiondocumentation folder contains documentation such as donor agreements and transfer forms, if included the original transfer, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The objects directory may also contain a metadata folder, which holds any metadata files included in the original transfer, and any OCR text files generated during processing.
Preservation Description Information (PDI)
The PDI in an Archivematica AIP is recorded in a METS XML file. METS is maintained by the Library of Congress, which defines it as “a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium.” In the Archivematica AIP the METS filename is composed of the filename METS with a UUID file extension and an XML file extension; for example METS.0ad8cdab-dbbf-4863-8a4d-9a675c227216.xml. The METS file typically consists of the following standard METS sections:
-metsHdr (METS header): basic information about the METS file;
-dmdSec (descriptive metadata section): descriptive metadata about the digital objects;
-amdSec (administrative metadata section): technical and provenance information about the digital objects;
-fileSec (file section): a list of the digital objects and an indication of their role in the AIP (original, preservation, metadata, submission documentation, license etc.);
-structMap (structural map): a physical or logical ordering of the digital objects.
The technical and provenance information in the METS amdSec is recorded as PREMIS metadata....
Use case: Create and Use a Bag Profile
https://github.com/ruebot/bagit-profiles
Archivematica could define a bag profile and reference this in the AIP's it produces. This would help make AIP's more easily machine readable.