Metadata elements
Revision as of 13:05, 2 September 2010 by Evelyn McLellan (talk | contribs)
Main Page > Development > Development documentation > Metadata elements
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
This process involves:
- Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
- Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements
- Comparing 1) to 2) in order to determine what gaps exist in Archivematica
- Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
- Structuring the required elements into the Repository eXchange Package (RXP) specification
- Determining what metadata belongs in the DIP(s)
Map of Archivematica 0.6 metadata to PREMIS elements
Source: /data/logs/MD5checksum.txt | |||||
Process: Produced when quarantine period expires. Provides checksums for each object in the SIP. Note that if zipped files are present, a checksum is generated for the zipped file and not for each object within it. | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
---|---|---|---|---|---|
Checksum | Object | 1.5.2 Fixity | 1.5.2.2 messageDigest | 326e0206ae83f815e4be5f28464f6ac6 | |
Source: /data/logs/filenameCleanup.log | |||||
Process: Produced when quarantine period expires, prior to unpacking of any zipped files. If prohibited characters were present in filenames, provides crosswalk between original and "cleaned up" filenames. | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
Original filename | Object | 1.6 originalName | none | Syllabus final.doc | |
Cleaned-up filename | Event | 2.5.2 eventOutcomeDetail | 2.5.2.1 eventOutcomeDetailNote | Syllabus_final.doc | |
Source: /data/logs/virusScan.log | |||||
Process: Produced when ingested files are scanned for viruses and malware | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
Scan result | Event | 2.5 eventOutcomeInformation | 2.5.1 eventOutcome | OK | |
Source: /data/logs/fileUUIDs.log | |||||
Process: Produced after prohibited characters are removed from filenames and any zipped files have been unpacked. Provides a crosswalk between cleaned-up filenames and UUIDs. | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
Universal unique identifier (UUID) | Object | 1.1 objectIdentifier | 1.1.2 objectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | |
Source: /data/logs/FITS-[UUID]-[SIP].xml (FITS output reports) | |||||
Process: Produced when FITS tool identifies and validates formats and extracts technical metadata | |||||
FITS element | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
format | Object | 1.5.4.1 formatDesignation | 1.5.4.1.1 formatName |
| |
version | Object | 1.5.4.1 formatDesignation | 1.5.4.1.2 formatVersion | 6.0 | |
externalIdentifier | Object | 1.5.4.2 formatRegistry | 1.5.4.2.2 formatRegistryKey | fmt/10 | |
Size | Object | 1.5 objectCharacteristics | 1.5.3 size | 125968 | |
ImageWidth | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 2464 | |
ImageHeight | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 3248 | |
SamplesPerPixel | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 3 | |
XResolution | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 300 | |
YResolution | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 300 | |
duration | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 0:2:26:16 | |
bitDepth/bitsPerSample | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 16 | |
sampleRate | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 48000.0 | |
channels | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 2 | |
aes:channelAssignment | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue |
| |
VideoFrameRate | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue |
| |
AspectRatio | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 1:1 | |
AudioFormat (for audio streams in video files) | |||||
AudioChannels (for audio streams in video files) | |||||
AudioBitsPerSample (for audio streams in video files) | |||||
AudioSampleRate (for audio streams in video files) | |||||
PageCount | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 16 | |
WordCount | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 876 | |
Paragraphs | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 19 | |
Slides | Object | 1.4 significantProperties | 1.4.2 significantPropertiesValue | 27 | |
Source: /data/logs/normalization.log | |||||
Process: Produced during normalization to preservation and access formats | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
Name of normalization tool | Agent | 3.2 agentName | none | FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2 | |
Event description | Event | 2.2 eventType | none | Normalizing | |
Processing status | Event | 2.5 eventOutcomeInformation | 2.5.1 eventOutcome | Processing completed | |
Normalization result | Event | 2.5.2 eventOutcomeDetail | 2.5.2.1 eventOutcomeDetailNote |
| |
Source: /data/logs/MD5checksum.txtprepareAIP_check.log | |||||
Process: Produced after file normalization process. Checks that checksums for files in the SIP have not changed during normalization. | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
Pass/fail notification | Event | 2.5 eventOutcomeInformation | 2.5.1 eventOutcome |
| |
Source: /data/logs/AIP.MD5checksum.txt | |||||
Process: Produced during BagIt process. Provides checksums for the AIP and for each original and normalized file in the AIP. | |||||
Description | PREMIS entity | PREMIS semantic unit | PREMIS semantic component | Sample value(s) | |
AIP checksum | Object | 1.5.2 Fixity | 1.5.2.2 messageDigest | 12b86e038bf0bddd5aba110c35f288b8 | |
File checksum | Object | 1.5.2 Fixity | 1.5.2.2 messageDigest | 326e0206ae83f815e4be5f28464f6ac6 |
Mandatory PREMIS elements (mandatory semantic units + mandatory components)
Entity | Semantic unit | Semantic component | Present in Archivematica? |
---|---|---|---|
Object | 1.1 objectIdentifier | 1.1.1 objectIdentifierType | No |
Object | 1.1 objectIdentifier | 1.1.2 objectIdentifierValue | Yes |
Object | 1.2 objectCategory | none | No |
Object | 1.5 objectCharacteristics | 1.5.1 Composition level | No |
Object | 1.5.4 objectCharacteristics/format | Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used |
|
Object | 1.7 Storage | Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded." | No, but retrieval may be managed through UUIDs. |
Event | 2.1 eventIdentifer | 2.1.1 eventIdentifierType | No |
Event | 2.1 eventIdentifer | 2.1.2 eventIdentifierValue | No |
Event | 2.2 eventType | none | Partial |
Event | 2.3 eventDateTime | none | Partial |
Agent | 3.1 agentIdentifier | 3.1.1 agentIdentifierType | No |
Agent | 3.1 agentIdentifier | 3.1.2 agentIdentifierValue | No |