Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
Line 21: Line 21:
 
*For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
 
*For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
 
*The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
 
*The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
*More than one agent may be included
+
 
 
<br>
 
<br>
  
Line 335: Line 335:
 
|2010-08-01T09:08:46-01:00
 
|2010-08-01T09:08:46-01:00
 
|mandatory unit and component
 
|mandatory unit and component
 +
|-
 +
|eventDetail
 +
|none
 +
|program="MD5deep"; version="3.6"
 +
|
 
|-
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifier
Line 345: Line 350:
 
|CVA
 
|CVA
 
|used to link an agent to an event; not mandatory but recommended
 
|used to link an agent to an event; not mandatory but recommended
|-
 
|linkingObjectIdentifier
 
|linkingObjectIdentifierType
 
|UUID
 
|used to link an object to an event; not mandatory but recommended
 
|-
 
|linkingObjectIdentifier
 
|linkingObjectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|used to link an object to an event; not mandatory but recommended
 
 
|-
 
|-
 
|agentIdentifier
 
|agentIdentifier
Line 707: Line 702:
 
Since AIPs are constructed from both original and normalized files, we need to determine what PREMIS elements should be used to describe the normalized files and their relationship to the originals.
 
Since AIPs are constructed from both original and normalized files, we need to determine what PREMIS elements should be used to describe the normalized files and their relationship to the originals.
  
===Original file metadata===
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|is source of
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the normalized file)
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Normalization
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|2010:05:19 00:49:15+00:00
 
|-
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|Processing completed
 
|-
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|Output #0, wav, to '/tmp/MultimediaSIP-9ece5881-640e-4bdc-9863-4ff50046a0bd/objects/sample.wav': Stream #0.0: Audio: pcm_s16le, 8000 Hz, stereo, s16, 256 kb/s
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|
 
|-
 
|Agent
 
|3.2 agentName
 
|none
 
|FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2
 
|-
 
|}
 
  
 
<br>
 
<br>
  
===Normalized file metadata===
 
  
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|has source
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the original file)
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.1 relatedEventIdentifierType
 
|
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.2 relatedEventIdentifierValue
 
|
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.1 messageDigestAlgorithm
 
|MD5
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.2 messageDigest
 
|537e0206ae83f815e4fg5f28464f6rt7
 
|-
 
|}
 
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]

Revision as of 16:34, 8 September 2010

Main Page > Development > Development documentation > Metadata elements

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.

This process involves:

  1. Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
  2. Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
  3. Comparing 1) to 2) in order to determine what gaps exist in Archivematica
  4. Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
  5. Structuring the required elements into the Repository eXchange Package (RXP) specification
  6. Determining what metadata belongs in the DIP(s)



Proposed PREMIS metadata for original file

This table is a template for metadata elements for the original file. Please note the following:

  • The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
  • For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
  • The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events


Semantic unit Semantic component Sample value(s) Notes
objectIdentifier objectIdentifierType UUID mandatory unit and component
objectIdentifier objectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96 mandatory unit and component
objectCategory none file mandatory unit and component
significantProperties significantPropertiesType ImageWidth repeatable semantic unit
significantProperties significantPropertiesValue 1024 repeatable semantic unit
objectCharacteristics compositionLevel 0 mandatory unit and component
objectCharacteristics/fixity messageDigestAlgorithm MD5
objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
objectCharacteristics/fixity messageDigestOriginator City of Vancouver
objectCharacteristics size 787510
objectCharacteristics/format/formatDesignation formatName Windows Bitmap format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatDesignation formatVersion 3.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryKey fmt/116 format is a mandatory unit; must use either formatDesignation or formatRegistry
relationship relationshipType derivation
relationship relationshipSubType is source of
relationship/relatedObjectIdentification relatedObjectIdentifierType UUID mandatory unit and component if there is a related object
relationship/relatedObjectIdentification relatedObjectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component if there is a related object
relationship/relatedEventIdentification relatedEventIdentifierType Archivematica ID "For derivative relationships between objects relatedEventIdentification must be recorded."
relationship/relatedEventIdentification relatedEventIdentifierValue 006 "For derivative relationships between objects relatedEventIdentification must be recorded."
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue 006 mandatory unit and component
eventType none normalization mandatory unit and component
eventDateTime none 2009-12-01T09:09:00-02:00 mandatory unit and component
eventDetail none Program="ImageMagick"; version="6.6.4.0" This element can be used to record information about software used and eliminates the need to have agent entities for software programs
eventOutcomeInformation eventOutcome normalization successful
linkingAgentIdentifier linkingAgentIdentifierType repository ID used to link an agent to an event; not mandatory but recommended
linkingAgentIdentifier linkingAgentIdentifierValue CVA used to link an agent to an event; not mandatory but recommended
agentIdentifier agentIdentifierType repository code mandatory unit and component
agentIdentifier agentIdentifierValue CVA mandatory unit and component
agentName none City of Vancouver Archives
agentType none organization


Proposed PREMIS metadata for normalized file (preservation copy)

Semantic unit Semantic component Sample value(s) Notes
objectIdentifier objectIdentifierType UUID mandatory unit and component
objectIdentifier objectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component
objectCategory none file mandatory unit and component
objectCharacteristics compositionLevel 0 mandatory unit and component
objectCharacteristics/fixity messageDigestAlgorithm MD5
objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
objectCharacteristics/fixity messageDigestOriginator City of Vancouver
objectCharacteristics/format/formatDesignation formatName Tagged Image File Format format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatDesignation formatVersion 6.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryKey fmt/10 format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/creatingApplication creatingApplicationName ImageMagick
objectCharacteristics/creatingApplication creatingApplicationVersion 6.6.4.0
relationship relationshipType derivation
relationship relationshipSubType has source
relationship/relatedObjectIdentification relatedObjectIdentifierType UUID
relationship/relatedObjectIdentification relatedObjectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue 006 mandatory unit and component
eventType none creation mandatory unit and component
eventDateTime none 2010-08-01T09:08:44-03:00 mandatory unit and component
eventDetail none %convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue 002 mandatory unit and component
eventType none message digest calculation mandatory unit and component
eventDateTime none 2010-08-01T09:08:46-01:00 mandatory unit and component
eventDetail none program="MD5deep"; version="3.6"
linkingAgentIdentifier linkingAgentIdentifierType repository ID used to link an agent to an event; not mandatory but recommended
linkingAgentIdentifier linkingAgentIdentifierValue CVA used to link an agent to an event; not mandatory but recommended
agentIdentifier agentIdentifierType repository code mandatory unit and component
agentIdentifier agentIdentifierValue CVA mandatory unit and component
agentName none City of Vancouver Archives
agentType none organization


Events requiring metadata

Receive SIP (SIP gets placed in 1-receiveSIP)

Semantic component Sample value(s) Automated? Notes
2.1.1 eventIdentifierType Y
2.1.2 eventIdentifierValue Y
3.1.1 agentIdentifierType user account Y
3.1.2 agentIdentifierValue demo Y
3.1.1 agentIdentifierType workstation id Y
3.1.2 agentIdentifierValue archivematica-1 Y



Check checksums

Metadata for each file in the SIP

Semantic component Sample value(s) Automated? Notes
2.1.1 eventIdentifierType Y
2.1.2 eventIdentifierValue Y
2.2 eventType Y
2.3 eventDateTime Y
3.1.1 agentIdentifierType software Y
3.1.2 agentIdentifierValue MD5sum Y
2.5.1 eventOutcome Pass; fail Y
2.5.2 eventOutcomeDetail j6059_02.wav FAILED Y


Generate checksums

Metadata for each file in the SIP for which a checksum is generated by Archivematica

Semantic component Sample value(s) Automated? Notes
2.1.1 eventIdentifierType Y
2.1.2 eventIdentifierValue Y
2.2 eventType Y
2.3 eventDateTime Y
3.1.1 agentIdentifierType software Y
3.1.2 agentIdentifierValue MD5sum Y
1.5.2.1 messageDigestAlgorithm MD5 Y
1.5.2.2 messageDigest fa10ee76a575bafe43335abf6cd60bae Y
1.5.2.3 messageDigestOriginator City of Vancouver Y



Review SIP

Semantic component Sample value(s) Automated? Notes
2.1.1 eventIdentifierType Y
2.1.2 eventIdentifierValue Y
2.2 eventType Y
2.3 eventDateTime Y
3.1.1 agentIdentifierType user account Y
3.1.2 agentIdentifierValue demo Y
2.5.1 eventOutcome {pass; conditional pass} {pass; conditional pass} N If it fails, it doesn't move on to become an AIP, so failure is not an option
2.5.2 eventOutcomeDetail Some files missing; appraisal required Some files missing; appraisal required N This field is mandatory if eventOutcome = conditional pass


Quarantine SIP

-when it went in and when it came out

Unpack zipped files

-tool used, time unpacked, event outcome (successful?), map of zipped file to unzipped contents (map for each unzipped file + link to event)

Assign UUIDs

-the usual stuff, map from original name to UUID

Remove prohibited characters=

-the usual stuff, map from original name to sanitized name

Virus scan

-the usual stuff, result for each file (include eventOutcomeDetail to describe type of fail such as the type of malware found)

File characterization

-identification: format name, format version, registry name, registry key -validation: well formed? Valid?

Appraise SIP

-usual event stuff -event outcome (no files removed; some files removed) -list of files removed

Normalization to preservation formats

-everything already in the table plus identification information: format name, format version, registry name, registry key

Normalization to access formats


Mandatory PREMIS elements (mandatory semantic units + mandatory components)

Entity Semantic unit Semantic component Present in Archivematica?
Object 1.1 objectIdentifier 1.1.1 objectIdentifierType No
Object 1.1 objectIdentifier 1.1.2 objectIdentifierValue Yes
Object 1.2 objectCategory none No
Object 1.5 objectCharacteristics 1.5.1 Composition level No
Object 1.5.4 objectCharacteristics/format Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
  • 1.5.4.1.1 formatName Yes
  • 1.5.4.2.1 formatRegistryName No
  • 1.5.4.2.2 formatRegistryKey Yes
Object 1.7 Storage Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded." No, but retrieval may be managed through UUIDs.
Event 2.1 eventIdentifer 2.1.1 eventIdentifierType No
Event 2.1 eventIdentifer 2.1.2 eventIdentifierValue No
Event 2.2 eventType none Partial
Event 2.3 eventDateTime none Partial
Agent 3.1 agentIdentifier 3.1.1 agentIdentifierType No
Agent 3.1 agentIdentifier 3.1.2 agentIdentifierValue No


PREMIS elements relating to derived objects

Since AIPs are constructed from both original and normalized files, we need to determine what PREMIS elements should be used to describe the normalized files and their relationship to the originals.