Difference between revisions of "Metadata elements"
Line 197: | Line 197: | ||
==Proposed PREMIS metadata for normalized file (preservation copy)== | ==Proposed PREMIS metadata for normalized file (preservation copy)== | ||
+ | Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation. | ||
{| border="1" cellpadding="10" cellspacing="0" width=90% | {| border="1" cellpadding="10" cellspacing="0" width=90% | ||
Line 260: | Line 261: | ||
|fmt/10 | |fmt/10 | ||
|format is a mandatory unit; must use either formatDesignation or formatRegistry | |format is a mandatory unit; must use either formatDesignation or formatRegistry | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
|relationship | |relationship | ||
Line 695: | Line 686: | ||
|No | |No | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Revision as of 16:57, 8 September 2010
Main Page > Development > Development documentation > Metadata elements
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
This process involves:
- Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
- Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
- Comparing 1) to 2) in order to determine what gaps exist in Archivematica
- Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
- Structuring the required elements into the Repository eXchange Package (RXP) specification
- Determining what metadata belongs in the DIP(s)
Proposed PREMIS metadata for original file
This table is a template for metadata elements for the original file. Please note the following:
- The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
- For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
- The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
objectIdentifier | objectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | mandatory unit and component |
objectCategory | none | file | mandatory unit and component |
significantProperties | significantPropertiesType | ImageWidth | repeatable semantic unit |
significantProperties | significantPropertiesValue | 1024 | repeatable semantic unit |
objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver | |
objectCharacteristics | size | 787510 | |
objectCharacteristics/format/formatDesignation | formatName | Windows Bitmap | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatDesignation | formatVersion | 3.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/116 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
relationship | relationshipType | derivation | |
relationship | relationshipSubType | is source of | |
relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | mandatory unit and component if there is a related object |
relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component if there is a related object |
relationship/relatedEventIdentification | relatedEventIdentifierType | Archivematica ID | "For derivative relationships between objects relatedEventIdentification must be recorded." |
relationship/relatedEventIdentification | relatedEventIdentifierValue | 006 | "For derivative relationships between objects relatedEventIdentification must be recorded." |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 006 | mandatory unit and component |
eventType | none | normalization | mandatory unit and component |
eventDateTime | none | 2009-12-01T09:09:00-02:00 | mandatory unit and component |
eventDetail | none | program=ImageMagick; version=6.6.4.0 | This element can be used to record information about software used and eliminates the need to have agent entities for software programs |
eventOutcomeInformation | eventOutcome | normalization successful | |
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | used to link an agent to an event; not mandatory but recommended |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA | used to link an agent to an event; not mandatory but recommended |
agentIdentifier | agentIdentifierType | repository code | mandatory unit and component |
agentIdentifier | agentIdentifierValue | CVA | mandatory unit and component |
agentName | none | City of Vancouver Archives | |
agentType | none | organization |
Proposed PREMIS metadata for normalized file (preservation copy)
Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
objectIdentifier | objectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component |
objectCategory | none | file | mandatory unit and component |
objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver | |
objectCharacteristics/format/formatDesignation | formatName | Tagged Image File Format | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatDesignation | formatVersion | 6.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/10 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
relationship | relationshipType | derivation | |
relationship | relationshipSubType | has source | |
relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | |
relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 006 | mandatory unit and component |
eventType | none | creation | mandatory unit and component |
eventDateTime | none | 2010-08-01T09:08:44-03:00 | mandatory unit and component |
eventDetail | none | program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat% | |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 002 | mandatory unit and component |
eventType | none | message digest calculation | mandatory unit and component |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | mandatory unit and component |
eventDetail | none | program="MD5deep"; version="3.6" | |
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | used to link an agent to an event; not mandatory but recommended |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA | used to link an agent to an event; not mandatory but recommended |
agentIdentifier | agentIdentifierType | repository code | mandatory unit and component |
agentIdentifier | agentIdentifierValue | CVA | mandatory unit and component |
agentName | none | City of Vancouver Archives | |
agentType | none | organization |
Events requiring metadata
Receive SIP (SIP gets placed in 1-receiveSIP)
Semantic component | Sample value(s) | Automated? | Notes |
---|---|---|---|
2.1.1 eventIdentifierType | Y | ||
2.1.2 eventIdentifierValue | Y | ||
3.1.1 agentIdentifierType | user account | Y | |
3.1.2 agentIdentifierValue | demo | Y | |
3.1.1 agentIdentifierType | workstation id | Y | |
3.1.2 agentIdentifierValue | archivematica-1 | Y |
Check checksums
Metadata for each file in the SIP
Semantic component | Sample value(s) | Automated? | Notes |
---|---|---|---|
2.1.1 eventIdentifierType | Y | ||
2.1.2 eventIdentifierValue | Y | ||
2.2 eventType | Y | ||
2.3 eventDateTime | Y | ||
3.1.1 agentIdentifierType | software | Y | |
3.1.2 agentIdentifierValue | MD5sum | Y | |
2.5.1 eventOutcome | Pass; fail | Y | |
2.5.2 eventOutcomeDetail | j6059_02.wav FAILED | Y |
Generate checksums
Metadata for each file in the SIP for which a checksum is generated by Archivematica
Semantic component | Sample value(s) | Automated? | Notes |
---|---|---|---|
2.1.1 eventIdentifierType | Y | ||
2.1.2 eventIdentifierValue | Y | ||
2.2 eventType | Y | ||
2.3 eventDateTime | Y | ||
3.1.1 agentIdentifierType | software | Y | |
3.1.2 agentIdentifierValue | MD5sum | Y | |
1.5.2.1 messageDigestAlgorithm | MD5 | Y | |
1.5.2.2 messageDigest | fa10ee76a575bafe43335abf6cd60bae | Y | |
1.5.2.3 messageDigestOriginator | City of Vancouver | Y |
Review SIP
Semantic component | Sample value(s) | Automated? | Notes |
---|---|---|---|
2.1.1 eventIdentifierType | Y | ||
2.1.2 eventIdentifierValue | Y | ||
2.2 eventType | Y | ||
2.3 eventDateTime | Y | ||
3.1.1 agentIdentifierType | user account | Y | |
3.1.2 agentIdentifierValue | demo | Y | |
2.5.1 eventOutcome {pass; conditional pass} | {pass; conditional pass} | N | If it fails, it doesn't move on to become an AIP, so failure is not an option |
2.5.2 eventOutcomeDetail Some files missing; appraisal required | Some files missing; appraisal required | N | This field is mandatory if eventOutcome = conditional pass |
Quarantine SIP
-when it went in and when it came out
Unpack zipped files
-tool used, time unpacked, event outcome (successful?), map of zipped file to unzipped contents (map for each unzipped file + link to event)
Assign UUIDs
-the usual stuff, map from original name to UUID
Remove prohibited characters=
-the usual stuff, map from original name to sanitized name
Virus scan
-the usual stuff, result for each file (include eventOutcomeDetail to describe type of fail such as the type of malware found)
File characterization
-identification: format name, format version, registry name, registry key -validation: well formed? Valid?
Appraise SIP
-usual event stuff -event outcome (no files removed; some files removed) -list of files removed
Normalization to preservation formats
-everything already in the table plus identification information: format name, format version, registry name, registry key
Normalization to access formats
Mandatory PREMIS elements (mandatory semantic units + mandatory components)
Entity | Semantic unit | Semantic component | Present in Archivematica? |
---|---|---|---|
Object | 1.1 objectIdentifier | 1.1.1 objectIdentifierType | No |
Object | 1.1 objectIdentifier | 1.1.2 objectIdentifierValue | Yes |
Object | 1.2 objectCategory | none | No |
Object | 1.5 objectCharacteristics | 1.5.1 Composition level | No |
Object | 1.5.4 objectCharacteristics/format | Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used |
|
Object | 1.7 Storage | Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded." | No, but retrieval may be managed through UUIDs. |
Event | 2.1 eventIdentifer | 2.1.1 eventIdentifierType | No |
Event | 2.1 eventIdentifer | 2.1.2 eventIdentifierValue | No |
Event | 2.2 eventType | none | Partial |
Event | 2.3 eventDateTime | none | Partial |
Agent | 3.1 agentIdentifier | 3.1.1 agentIdentifierType | No |
Agent | 3.1 agentIdentifier | 3.1.2 agentIdentifierValue | No |