Difference between revisions of "Metadata elements"
Line 693: | Line 693: | ||
|eventIdentifier | |eventIdentifier | ||
|eventIdentifierValue | |eventIdentifierValue | ||
− | | | + | |22n50321-6d7b-3847-89ag-a8b0fhc1f288 |
| | | | ||
|- | |- |
Revision as of 15:03, 4 November 2010
Main Page > Development > Development documentation > Metadata elements
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
Design process
This process involves:
- Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
- Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
- Comparing 1) to 2) in order to determine what gaps exist in Archivematica
- Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
- Structuring the required elements into the Repository eXchange Package (RXP) specification
- Determining what metadata belong in the DIP(s)
Proposed PREMIS metadata for original file
This table is a template for metadata elements for the original file. Please note the following:
- The format semantic unit would be repeated as needed if FITS identified several possible formats for the file
- The eventOutcomeDetail semantic unit would be repeated as needed to capture detailed information generated by an event
- This table includes one event as an example (normalization); a real PREMIS file would contain information about numerous events (see Event metadata, below)
- This table includes two agent entities: an organization (City of Vancouver Archives) and a software program (Archivematica). The organization is the agent for manual events such as reviewing the SIP, while Archivematica is the agent for automated events such as normalization. Further agents may be included (such as individuals, workstations etc) but the two agents specified in this table would be a minimum.
PREMIS entity | Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|---|
object | objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
object | objectIdentifier | objectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | mandatory unit and component |
object | objectCategory | none | file | mandatory unit and component |
object | objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
object | objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
object | objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
object | objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver Archives | |
object | objectCharacteristics | size | 787510 | |
object | objectCharacteristics/format/formatDesignation | formatName | Windows Bitmap | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatDesignation | formatVersion | 3.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/116 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics | objectCharacteristicsExtension |
<fits xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.3.2" timestamp="8/10/10 7:28 PM"> + selected FITS output |
objectCharacteristicsExtension is used for additional object characteristics not covered by PREMIS, for instance format specific metadata that is defined externally. |
object | originalName | none | /SAE Project files/newsletters/20100223/cover image.bmp | |
object | relationship | relationshipType | derivation | |
object | relationship | relationshipSubType | is source of | |
object | relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | mandatory unit and component if there is a related object |
object | relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component if there is a related object |
object | relationship/relatedEventIdentification | relatedEventIdentifierType | Archivematica ID | "For derivative relationships between objects relatedEventIdentification must be recorded." |
object | relationship/relatedEventIdentification | relatedEventIdentifierValue | [alphanumeric code] | "For derivative relationships between objects relatedEventIdentification must be recorded." |
event | eventIdentifier | eventIdentifierType | UUID | mandatory unit and component |
event | eventIdentifier | eventIdentifierValue | 8jb50321-6d7b-4291-89ag-a8b0fhc1f276 | mandatory unit and component |
event | eventType | none | normalization | mandatory unit and component |
event | eventDateTime | none | 2009-12-01T09:09:00-02:00 | mandatory unit and component |
event | eventDetail | none | program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%" | This element can be used to record information about software used and eliminates the need to have agent entities for software programs |
event | eventOutcomeInformation | eventOutcome | {normalized; not normalized} | |
event | eventOutcomeDetail | eventOutcomeDetailNote |
|
Repeatable container |
event | eventOutcomeDetail | eventOutcomeDetailNote | cover_image.tiff | Repeatable container |
event | linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | used to link an agent to an event; not mandatory but recommended |
event | linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 | used to link an agent to an event; not mandatory but recommended |
agent | agentIdentifier | agentIdentifierType | repository code | mandatory unit and component |
agent | agentIdentifier | agentIdentifierValue | CVA | mandatory unit and component |
agent | agentName | none | City of Vancouver Archives | |
agent | agentType | none | organization | |
agent | agentIdentifier | agentIdentifierType | preservation system | mandatory unit and component |
agent | agentIdentifier | agentIdentifierValue | Archivematica-0.7 | mandatory unit and component |
agent | agentName | none | Archivematica | |
agent | agentType | none | software |
Proposed PREMIS metadata for normalized file (preservation copy)
Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.
PREMIS entity | Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|---|
object | objectIdentifier | objectIdentifierType | UUID | mandatory unit and component |
object | objectIdentifier | objectIdentifierValue | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | mandatory unit and component |
object | objectCategory | none | file | mandatory unit and component |
object | objectCharacteristics | compositionLevel | 0 | mandatory unit and component |
object | objectCharacteristics/fixity | messageDigestAlgorithm | MD5 | |
object | objectCharacteristics/fixity | messageDigest | e479688508922354bdab09bca60d8d0e | |
object | objectCharacteristics/fixity | messageDigestOriginator | City of Vancouver Archives | |
object | objectCharacteristics/format/formatDesignation | formatName | Tagged Image File Format | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatDesignation | formatVersion | 6.0 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatRegistry | formatRegistryName | PRONOM | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | objectCharacteristics/format/formatRegistry | formatRegistryKey | fmt/10 | format is a mandatory unit; must use either formatDesignation or formatRegistry |
object | relationship | relationshipType | derivation | |
object | relationship | relationshipSubType | has source | |
object | relationship/relatedObjectIdentification | relatedObjectIdentifierType | UUID | |
object | relationship/relatedObjectIdentification | relatedObjectIdentifierValue | 0db50321-6d7b-4291-89ec-a8b0adc1ff96 | |
object | relationship/relatedEventIdentification | relatedEventIdentifierType | Archivematica ID | "For derivative relationships between objects relatedEventIdentification must be recorded." |
object | relationship/relatedEventIdentification | relatedEventIdentifierValue | [alphanumeric code] | "For derivative relationships between objects relatedEventIdentification must be recorded." |
event | eventIdentifier | eventIdentifierType | UUID | mandatory unit and component |
event | eventIdentifier | eventIdentifierValue | 05y50321-6d7b-4291-89ag-a8b0fhc1f286 | mandatory unit and component |
event | eventType | none | creation | mandatory unit and component |
event | eventDateTime | none | 2010-08-01T09:08:44-03:00 | mandatory unit and component |
event | eventDetail | none | program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat% | |
event | linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | used to link an agent to an event; not mandatory but recommended |
event | linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 | used to link an agent to an event; not mandatory but recommended |
event | eventIdentifier | eventIdentifierType | Archivematica ID | mandatory unit and component |
event | eventIdentifier | eventIdentifierValue | [alphanumeric code] | mandatory unit and component |
event | eventType | none | message digest calculation | mandatory unit and component |
event | eventDateTime | none | 2010-08-01T09:08:46-01:00 | mandatory unit and component |
event | eventDetail | none | program="MD5deep"; version="3.6" | |
event | linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | used to link an agent to an event; not mandatory but recommended |
event | linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 | used to link an agent to an event; not mandatory but recommended |
agent | agentIdentifier | agentIdentifierType | preservation system | mandatory unit and component |
agent | agentIdentifier | agentIdentifierValue | Archivematica-0.7 | mandatory unit and component |
agent | agentName | none | Archivematica | |
agent | agentType | none | software |
Event metadata
Receive SIP (SIP gets placed in 1-receiveSIP)
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 83n50321-6d7b-3847-89ag-a8b0fhc1f273 | |
eventType | none | ingestion | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository code | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Check checksums
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 21h50321-6d7b-3855-89ag-a8b0fhc1f256 | |
eventType | none | fixity check | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="MD5Deep"; version="3.6" | |
eventOutcomeInformation | eventOutcome | {pass; fail} | |
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Generate checksums
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 0hc50321-6d7b-3847-89ag-a8b0fhc1f245 | |
eventType | none | message digest calculation | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="MD5Deep"; version="3.6" | |
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | e479688508922354bdab09bca60d8d0e | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Review SIP
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | 22n50321-6d7b-3847-89ag-a8b0fhc1f288 | |
eventType | none | SIP review | |
eventDateTime | none | [date - may not be automatically generated] | |
eventDetail | none | [free text field - could include information about the Submission Information Agreement against which the SIP was checked, etc.] | |
eventOutcomeInformation | eventOutcome | {pass; conditional pass} | |
eventOutcomeDetail | eventOutcomeDetailNote |
|
|
linkingAgentIdentifier | linkingAgentIdentifierType | repository code | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Place SIP in quarantine
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 22m50321-6d7b-4637-89ag-a8b0fhc1f234 | |
eventType | none | quarantine | |
eventDateTime | none | 2010-09-01T09:09:00-02:00/2010-10-01T09:09:00-02:00 | This a start date and end date (date ranges are permitted by PREMIS) |
eventDetail | none | ||
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | repository code | |
linkingAgentIdentifier | linkingAgentIdentifierValue | CVA |
Unpack zipped files
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 12j50321-6d7b-0047-89ag-a8b0fhc1f211 | |
eventType | none | unpack | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="easyextract"; version="0.1.0" | |
eventOutcomeInformation | eventOutcome | {unpacking required; unpacking not required} | |
eventOutcomeDetail | eventOutcomeDetailNote | unpacked Newsletter.zip | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Assign UUIDs
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 90n50321-6d7b-6453-89ag-a8b0fhc1f250 | |
eventType | none | identifier assignment | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="UUID"; version="1.6.2" | |
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | 270bd067-0483-4c5f-bdec-f2cbd6e651aa | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Remove prohibited characters
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 83n50321-6d7b-3847-89ag-a8b0fhc1f273 | |
eventType | none | name cleanup | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="detox"; version="1.2.0-1" | |
eventOutcomeInformation | eventOutcome | cleanup required | |
eventOutcomeDetail | eventOutcomeDetailNote | Original name="cover image.bmp"; cleaned up name="cover_image.bmp" | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Scan for viruses
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | Archivematica ID | |
eventIdentifier | eventIdentifierValue | 09n50321-6d7b-3596-89ag-a8b0fhc1f288 | |
eventType | none | virus check | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="Clam AV"; version="0.95.2" | |
eventOutcomeInformation | eventOutcome | pass | |
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Identify format
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 16n50321-6d7b-3847-89ag-a8b0fhc1f299 | |
eventType | none | format identification | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="File Information Toolset"; version="0.2.6" | |
eventOutcomeInformation | eventOutcome | {positive; tentative; unidentified} | |
eventOutcomeDetail | eventOutcomeDetailNote | fmt/116 | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Validate format
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 33n50321-6d7b-3888-89ag-a8b0fhc1f264 | |
eventType | none | validation | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="File Information Toolset"; version="0.2.6" | |
eventOutcomeInformation | eventOutcome | {pass; partial pass; fail} | |
eventOutcomeDetail | eventOutcomeDetailNote | format="Windows Bitmap"; version="3.0"; result="Well-formed and valid" |
|
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Normalize to preservation format
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 05n50321-6d7b-3447-89ag-a8b0fhc1f274 | |
eventType | none | normalization | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat% | |
eventOutcomeInformation | eventOutcome | {normalized; not normalized} | |
eventOutcomeDetail | eventOutcomeDetailNote |
|
|
eventOutcomeDetail | eventOutcomeDetailNote | cover_image.tiff | |
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Create file
This event is recorded only for preservation and access copies, not for original files
Semantic unit | Semantic component | Sample value(s) | Notes |
---|---|---|---|
eventIdentifier | eventIdentifierType | UUID | |
eventIdentifier | eventIdentifierValue | 55n50321-6d7b-3987-89ag-a8b0fhc1f212 | |
eventType | none | creation | |
eventDateTime | none | 2010-08-01T09:08:46-01:00 | |
eventDetail | none | program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%" | |
eventOutcomeInformation | eventOutcome | ||
eventOutcomeDetail | eventOutcomeDetailNote | ||
linkingAgentIdentifier | linkingAgentIdentifierType | preservation system | |
linkingAgentIdentifier | linkingAgentIdentifierValue | Archivematica-0.7 |
Sample PREMIS file
A sample PREMIS file has been mocked up and is available at http://www.archivematica.org/downloads/archivematica-digiprov-sample-v1.xml.
Sources consulted
- PREMIS data dictionary for preservation metadata, version 2.0, March 2008, http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
- METS metadata coding and encryption standard primer and reference manual, revised April 2010, http://www.loc.gov/standards/mets/METSPrimerRevised.pdf
- Repository eXchange Package (RXP) Spec, Version 0.96, http://wiki.fcla.edu:8000/TIPR/21
- Chain of preservation model, InterPARES, consultation draft, August, 2007, http://www.interpares.org/ip2/ip2_model_display.cfm?model=cop
- Conformant implementation of the PREMIS data dictionary, PREMIS Editorial Committee, October 2010, http://www.loc.gov/standards/premis/premis-conformance-oct2010.pdf
- A checklist for documenting PREMIS-METS decisions in a PREMIS profile, May 2010, Sally Vermaaten, OCLC, http://www.loc.gov/standards/premis/premis_mets_checklist.pdf
- A checklist and a case for documenting PREMIS-METS Decisions in a METS profile, Sally Vermaaten, D-Lib Magazine, September/October 2010, http://www.dlib.org/dlib/september10/vermaaten/09vermaaten.html