Metadata elements

From Archivematica
Revision as of 17:52, 5 November 2010 by Evelyn McLellan (talk | contribs)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Metadata elements

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.

Design process

This process involves:

  1. Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
  2. Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
  3. Comparing 1) to 2) in order to determine what gaps exist in Archivematica
  4. Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
  5. Structuring the required elements into the Repository eXchange Package (RXP) specification
  6. Determining what metadata belong in the DIP(s)



Proposed PREMIS metadata for original file

This table is a template for metadata elements for the original file. Please note the following:

  • The format semantic unit would be repeated as needed if FITS identified several possible formats for the file
  • The eventOutcomeDetail semantic unit would be repeated as needed to capture detailed information generated by an event
  • This table includes one event as an example (normalization); a real PREMIS file would contain information about numerous events (see Event metadata, below)
  • This table includes two agent entities: an organization (City of Vancouver Archives) and a software program (Archivematica). The organization is the agent for manual events such as reviewing the SIP, while Archivematica is the agent for automated events such as normalization. Further agents may be included (such as individuals, workstations etc) but the two agents specified in this table would be a minimum.


PREMIS entity Semantic unit Semantic component Sample value(s) Notes
object objectIdentifier objectIdentifierType UUID mandatory unit and component
object objectIdentifier objectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96 mandatory unit and component
object objectCategory none file mandatory unit and component
object objectCharacteristics compositionLevel 0 mandatory unit and component
object objectCharacteristics/fixity messageDigestAlgorithm MD5
object objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
object objectCharacteristics/fixity messageDigestOriginator City of Vancouver Archives
object objectCharacteristics size 787510
object objectCharacteristics/format/formatDesignation formatName Windows Bitmap format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatDesignation formatVersion 3.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatRegistry formatRegistryKey fmt/116 format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics objectCharacteristicsExtension

<fits xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.3.2" timestamp="8/10/10 7:28 PM"> + selected FITS output

objectCharacteristicsExtension is used for additional object characteristics not covered by PREMIS, for instance format specific metadata that is defined externally.
object originalName none /SAE Project files/newsletters/20100223/cover image.bmp
object relationship relationshipType derivation
object relationship relationshipSubType is source of
object relationship/relatedObjectIdentification relatedObjectIdentifierType UUID mandatory unit and component if there is a related object
object relationship/relatedObjectIdentification relatedObjectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component if there is a related object
object relationship/relatedEventIdentification relatedEventIdentifierType Archivematica ID "For derivative relationships between objects relatedEventIdentification must be recorded."
object relationship/relatedEventIdentification relatedEventIdentifierValue [alphanumeric code] "For derivative relationships between objects relatedEventIdentification must be recorded."
event eventIdentifier eventIdentifierType UUID mandatory unit and component
event eventIdentifier eventIdentifierValue 8jb50321-6d7b-4291-89ag-a8b0fhc1f276 mandatory unit and component
event eventType none normalization mandatory unit and component
event eventDateTime none 2009-12-01T09:09:00-02:00 mandatory unit and component
event eventDetail none program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%" This element can be used to record information about software used and eliminates the need to have agent entities for software programs
event eventOutcomeInformation eventOutcome {normalized; not normalized}
event eventOutcomeDetail eventOutcomeDetailNote
  • Normalization failed
  • Already in preservation format. No need to normalize
Repeatable container
event eventOutcomeDetail eventOutcomeDetailNote cover_image.tiff Repeatable container
event linkingAgentIdentifier linkingAgentIdentifierType preservation system used to link an agent to an event; not mandatory but recommended
event linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7 used to link an agent to an event; not mandatory but recommended
agent agentIdentifier agentIdentifierType repository code mandatory unit and component
agent agentIdentifier agentIdentifierValue CVA mandatory unit and component
agent agentName none City of Vancouver Archives
agent agentType none organization
agent agentIdentifier agentIdentifierType preservation system mandatory unit and component
agent agentIdentifier agentIdentifierValue Archivematica-0.7 mandatory unit and component
agent agentName none Archivematica
agent agentType none software


Proposed PREMIS metadata for normalized file (preservation copy)

Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.

PREMIS entity Semantic unit Semantic component Sample value(s) Notes
object objectIdentifier objectIdentifierType UUID mandatory unit and component
object objectIdentifier objectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component
object objectCategory none file mandatory unit and component
object objectCharacteristics compositionLevel 0 mandatory unit and component
object objectCharacteristics/fixity messageDigestAlgorithm MD5
object objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
object objectCharacteristics/fixity messageDigestOriginator City of Vancouver Archives
object objectCharacteristics/format/formatDesignation formatName Tagged Image File Format format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatDesignation formatVersion 6.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
object objectCharacteristics/format/formatRegistry formatRegistryKey fmt/10 format is a mandatory unit; must use either formatDesignation or formatRegistry
object relationship relationshipType derivation
object relationship relationshipSubType has source
object relationship/relatedObjectIdentification relatedObjectIdentifierType UUID
object relationship/relatedObjectIdentification relatedObjectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96
object relationship/relatedEventIdentification relatedEventIdentifierType Archivematica ID "For derivative relationships between objects relatedEventIdentification must be recorded."
object relationship/relatedEventIdentification relatedEventIdentifierValue [alphanumeric code] "For derivative relationships between objects relatedEventIdentification must be recorded."
event eventIdentifier eventIdentifierType UUID mandatory unit and component
event eventIdentifier eventIdentifierValue 05y50321-6d7b-4291-89ag-a8b0fhc1f286 mandatory unit and component
event eventType none creation mandatory unit and component
event eventDateTime none 2010-08-01T09:08:44-03:00 mandatory unit and component
event eventDetail none program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
event linkingAgentIdentifier linkingAgentIdentifierType preservation system used to link an agent to an event; not mandatory but recommended
event linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7 used to link an agent to an event; not mandatory but recommended
event eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
event eventIdentifier eventIdentifierValue [alphanumeric code] mandatory unit and component
event eventType none message digest calculation mandatory unit and component
event eventDateTime none 2010-08-01T09:08:46-01:00 mandatory unit and component
event eventDetail none program="MD5deep"; version="3.6"
event linkingAgentIdentifier linkingAgentIdentifierType preservation system used to link an agent to an event; not mandatory but recommended
event linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7 used to link an agent to an event; not mandatory but recommended
agent agentIdentifier agentIdentifierType preservation system mandatory unit and component
agent agentIdentifier agentIdentifierValue Archivematica-0.7 mandatory unit and component
agent agentName none Archivematica
agent agentType none software


Event metadata

Receive SIP (SIP gets placed in 1-receiveSIP and is assigned a UUID)

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 83n50321-6d7b-3847-89ag-a8b0fhc1f273
eventType none ingestion
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository code
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Check checksums

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 21h50321-6d7b-3855-89ag-a8b0fhc1f256
eventType none fixity check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Generate checksums

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 0hc50321-6d7b-3847-89ag-a8b0fhc1f245
eventType none message digest calculation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote e479688508922354bdab09bca60d8d0e
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Review SIP

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 22n50321-6d7b-3847-89ag-a8b0fhc1f288
eventType none SIP review
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none [free text field - could include information about the Submission Information Agreement against which the SIP was checked, etc.]
eventOutcomeInformation eventOutcome {pass; conditional pass}
eventOutcomeDetail eventOutcomeDetailNote
  • some files missing
  • appraisal required
  • some files deleted by the archivist
linkingAgentIdentifier linkingAgentIdentifierType repository code
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Start quarantine

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 22m50321-6d7b-4637-89ag-a8b0fhc1f234
eventType none start quarantine
eventDateTime none 2010-09-01T09:09:00-02:00/2010-10-01T09:09:00-02:00 This a start date and end date (date ranges are permitted by PREMIS)
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository code
linkingAgentIdentifier linkingAgentIdentifierValue CVA


End quarantine

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 4am50321-6d7b-a46g-8900-a8b0fhc1f287
eventType none end quarantine
eventDateTime none 2010-10-01T09:09:00-02:00
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository code
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Unpackage packaged files

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 12j50321-6d7b-0047-89ag-a8b0fhc1f211
eventType none unpackage
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="easyextract"; version="0.1.0"
eventOutcomeInformation eventOutcome unpackaged
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Assign UUIDs

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 90n50321-6d7b-6453-89ag-a8b0fhc1f250
eventType none identifier assignment
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="UUID"; version="1.6.2"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote 270bd067-0483-4c5f-bdec-f2cbd6e651aa
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Remove prohibited characters

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 83n50321-6d7b-3847-89ag-a8b0fhc1f273
eventType none name cleanup
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="detox"; version="1.2.0-1"
eventOutcomeInformation eventOutcome prohibited characters removed
eventOutcomeDetail eventOutcomeDetailNote Original name="cover image.bmp"; cleaned up name="cover_image.bmp"
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Scan for viruses

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue 09n50321-6d7b-3596-89ag-a8b0fhc1f288
eventType none virus check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="Clam AV"; version="0.95.2"
eventOutcomeInformation eventOutcome pass
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Identify format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 16n50321-6d7b-3847-89ag-a8b0fhc1f299
eventType none format identification
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="File Information Toolset"; version="0.2.6"
eventOutcomeInformation eventOutcome {positive; tentative; unidentified}
eventOutcomeDetail eventOutcomeDetailNote fmt/116
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Validate format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 33n50321-6d7b-3888-89ag-a8b0fhc1f264
eventType none validation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="File Information Toolset"; version="0.2.6"
eventOutcomeInformation eventOutcome {pass; partial pass; fail}
eventOutcomeDetail eventOutcomeDetailNote format="Windows Bitmap"; version="3.0"; result="Well-formed and valid"
  • It is important to include the format and version against which Jhove is validating the file; otherwise it can misidentify or fail to identify the format and give a false positive for validation (for example, when it identifies the format as "bytestream" and then declares the file "Well-formed and valid").
  • This semantic unit can be repeated if there is a specific error message relating to failed validation.
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Normalize to preservation format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 05n50321-6d7b-3447-89ag-a8b0fhc1f274
eventType none normalization
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
eventOutcomeInformation eventOutcome {normalized; not normalized}
eventOutcomeDetail eventOutcomeDetailNote
  • Normalization failed
  • Already in preservation format. No need to normalize
eventOutcomeDetail eventOutcomeDetailNote cover_image.tiff
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7



Create file

This event is recorded only for preservation copies, not for original files

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 55n50321-6d7b-3987-89ag-a8b0fhc1f212
eventType none creation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.7


Sample PREMIS file

A sample PREMIS file has been mocked up and is available at http://www.archivematica.org/downloads/archivematica-digiprov-sample-v1.xml.

Sources consulted