Metadata elements
Revision as of 15:32, 9 September 2010 by Evelyn McLellan (talk | contribs)
Main Page > Development > Development documentation > Metadata elements
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
This process involves:
- Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
- Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
- Comparing 1) to 2) in order to determine what gaps exist in Archivematica
- Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
- Structuring the required elements into the Repository eXchange Package (RXP) specification
- Determining what metadata belongs in the DIP(s)
Proposed PREMIS metadata for original file
This table is a template for metadata elements for the original file. Please note the following:
- The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
- The format semantic unit would be repeated as needed if FITS identified several possible formats for the file
- The eventOutcomeDetail semantic unit would be repeated as needed to capture detailed information generated by an event
- For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
- The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
- There will be at least two agent entities: an organization, such as City of Vancouver Archives, and Archivematica. The organization would be the agent for manual events such as reviewing the SIP, while Archivematica would be the agent for automated events such as normalization.
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
objectIdentifier | objectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | mandatory unit and component |
objectCategory | none | file | mandatory unit and component |
significantProperties | significantPropertiesType | ImageWidth | repeatable semantic unit |
significantProperties | significantPropertiesValue | 1024 | repeatable semantic unit |
objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver Archives | |
objectCharacteristics | size | 787510 | |
objectCharacteristics/format/formatDesignation | formatName | Windows Bitmap | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatDesignation | formatVersion | 3.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/116 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
originalName | none | /SAE Project files/newsletters/20100223/cover image.bmp | |
relationship | relationshipType | derivation | |
relationship | relationshipSubType | is source of | |
relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | mandatory unit and component if there is a related object |
relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component if there is a related object |
relationship/relatedEventIdentification | relatedEventIdentifierType | Archivematica ID | "For derivative relationships between objects relatedEventIdentification must be recorded." |
relationship/relatedEventIdentification | relatedEventIdentifierValue | 006 | "For derivative relationships between objects relatedEventIdentification must be recorded." |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 006 | mandatory unit and component |
eventType | none | normalization | mandatory unit and component |
eventDateTime | none | 2009-12-01T09:09:00-02:00 | mandatory unit and component |
eventDetail | none | program=ImageMagick; version=6.6.4.0 | This element can be used to record information about software used and eliminates the need to have agent entities for software programs |
eventOutcomeInformation | eventOutcome | normalization successful | |
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | used to link an agent to an event; not mandatory but recommended |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.6 | used to link an agent to an event; not mandatory but recommended |
agentIdentifier | agentIdentifierType | repository code | mandatory unit and component |
agentIdentifier | agentIdentifierValue | CVA | mandatory unit and component |
agentName | none | City of Vancouver Archives | |
agentType | none | organization | |
agentIdentifier | agentIdentifierType | preservation system | mandatory unit and component |
agentIdentifier | agentIdentifierValue | Archivematica-0.6 | mandatory unit and component |
agentName | none | Archivematica | |
agentType | none | software |
Proposed PREMIS metadata for normalized file (preservation copy)
Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
objectIdentifier | objectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component |
objectCategory | none | file | mandatory unit and component |
objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver Archives | |
objectCharacteristics/format/formatDesignation | formatName | Tagged Image File Format | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatDesignation | formatVersion | 6.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/10 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
relationship | relationshipType | derivation | |
relationship | relationshipSubType | has source | |
relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | |
relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 006 | mandatory unit and component |
eventType | none | creation | mandatory unit and component |
eventDateTime | none | 2010-08-01T09:08:44-03:00 | mandatory unit and component |
eventDetail | none | program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat% | |
eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
eventIdentifier | eventIdentifierValue | 002 | mandatory unit and component |
eventType | none | message digest calculation | mandatory unit and component |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | mandatory unit and component |
eventDetail | none | program="MD5deep"; version="3.6" | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | used to link an agent to an event; not mandatory but recommended |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.6 | used to link an agent to an event; not mandatory but recommended |
agentIdentifier | agentIdentifierType | preservation system | mandatory unit and component |
agentIdentifier | agentIdentifierValue | Archivematica-0.6 | mandatory unit and component |
agentName | none | Archivematica | |
agentType | none | software |
Events requiring metadata
Receive SIP (SIP gets placed in 1-receiveSIP)
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ingestion | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Check checksums
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Generate checksums
Check checksums
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Review SIP
Check checksums
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Place SIP in quarantine
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Remove SIP from quarantine
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Unpack zipped files
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Assign UUIDs
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.6 |
Remove prohibited characters
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Virus scan
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
File identification
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Validation
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Normalization to preservation formats
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Generation of access copy
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Creation (for normalized files)
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | [alphanumeric code] | |
eventType | none | ||
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository ID | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Mandatory PREMIS elements (mandatory semantic units + mandatory components)
Entity | Semantic unit | Semantic component | Present in Archivematica? |
---|---|---|---|
Object | 1.1 objectIdentifier | 1.1.1 objectIdentifierType | No |
Object | 1.1 objectIdentifier | 1.1.2 objectIdentifierValue | Yes |
Object | 1.2 objectCategory | none | No |
Object | 1.5 objectCharacteristics | 1.5.1 Composition level | No |
Object | 1.5.4 objectCharacteristics/format | Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used |
|
Object | 1.7 Storage | Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded." | No, but retrieval may be managed through UUIDs. |
Event | 2.1 eventIdentifer | 2.1.1 eventIdentifierType | No |
Event | 2.1 eventIdentifer | 2.1.2 eventIdentifierValue | No |
Event | 2.2 eventType | none | Partial |
Event | 2.3 eventDateTime | none | Partial |
Agent | 3.1 agentIdentifier | 3.1.1 agentIdentifierType | No |
Agent | 3.1 agentIdentifier | 3.1.2 agentIdentifierValue | No |