Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
 
(64 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
  
This process involves:
+
*[[METS]]
 
+
*[[PREMIS]]
# Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
+
*[[PREMIS metadata: original files]]
# Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see [[Existing elements]])
+
*[[PREMIS metadata: normalized files]]
# Comparing 1) to 2) in order to determine what gaps exist in Archivematica
+
*[[PREMIS metadata: events]]
# Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
+
*[[PREMIS metadata: rights - 0.10]]
# Structuring the required elements into the [http://wiki.fcla.edu:8000/TIPR/21 Repository eXchange Package (RXP) specification]
+
*[[PREMIS/METS for scalability]]
# Determining what metadata belongs in the DIP(s)
+
*[[RDF/OWL]]
 
 
 
 
<br>
 
 
 
==Proposed PREMIS metadata for original file==
 
 
 
This table is a template for metadata elements for the original file. Please note the following:
 
*The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
 
*For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
 
*The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
 
*
 
<br>
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|objectIdentifier
 
|objectIdentifierType
 
|UUID
 
|mandatory unit and component
 
|-
 
|objectIdentifier
 
|objectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|mandatory unit and component
 
|-
 
|objectCategory
 
|none
 
|file
 
|mandatory unit and component
 
|-
 
|significantProperties
 
|significantPropertiesType
 
|ImageWidth
 
|repeatable semantic unit
 
|-
 
|significantProperties
 
|significantPropertiesValue
 
|1024
 
|repeatable semantic unit
 
|-
 
|objectCharacteristics
 
|compositionLevel
 
|0
 
|mandatory unit and component
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestAlgorithm
 
|MD5
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigest
 
|e479688508922354bdab09bca60d8d0e
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestOriginator
 
|City of Vancouver
 
|
 
|-
 
|objectCharacteristics
 
|size
 
|787510
 
|
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatName
 
|Windows Bitmap
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatVersion
 
|3.0
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryName
 
|PRONOM
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryKey
 
|fmt/116
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|relationship
 
|relationshipType
 
|derivation
 
|
 
|-
 
|relationship
 
|relationshipSubType
 
|is source of
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierType
 
|UUID
 
|mandatory unit and component if there is a related object
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|mandatory unit and component if there is a related object
 
|-
 
|relationship/relatedEventIdentification
 
|relatedEventIdentifierType
 
|Archivematica ID
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|-
 
|relationship/relatedEventIdentification
 
|relatedEventIdentifierValue
 
|006
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|006
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|normalization
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2009-12-01T09:09:00-02:00
 
|mandatory unit and component
 
|-
 
|eventDetail
 
|none
 
|Program="ImageMagick"; version="6.6.4.0"
 
|
 
|-
 
|eventOutcomeInformation
 
|eventOutcome
 
|normalization successful
 
|
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierType
 
|repository ID
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierValue
 
|CVA
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|agentIdentifier
 
|agentIdentifierType
 
|repository code
 
|mandatory unit and component
 
|-
 
|agentIdentifier
 
|agentIdentifierValue
 
|CVA
 
|mandatory unit and component
 
|-
 
|agentName
 
|none
 
|City of Vancouver Archives
 
|
 
|-
 
|agentType
 
|none
 
|organization
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
==Proposed PREMIS metadata for normalized file (preservation copy)==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|objectIdentifier
 
|objectIdentifierType
 
|UUID
 
|mandatory unit and component
 
|-
 
|objectIdentifier
 
|objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|mandatory unit and component
 
|-
 
|objectCategory
 
|none
 
|file
 
|mandatory unit and component
 
|-
 
|objectCharacteristics
 
|compositionLevel
 
|0
 
|mandatory unit and component
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestAlgorithm
 
|MD5
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigest
 
|e479688508922354bdab09bca60d8d0e
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestOriginator
 
|City of Vancouver
 
|
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatName
 
|Tagged Image File Format
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatVersion
 
|6.0
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryName
 
|PRONOM
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryKey
 
|fmt/10
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/creatingApplication
 
|creatingApplicationName
 
|ImageMagick
 
|
 
|-
 
|objectCharacteristics/creatingApplication
 
|creatingApplicationVersion
 
|6.6.4.0
 
|
 
|-
 
|relationship
 
|relationshipType
 
|derivation
 
|
 
|-
 
|relationship
 
|relationshipSubType
 
|has source
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierType
 
|UUID
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|006
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|creation
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2010-08-01T09:08:44-03:00
 
|mandatory unit and component
 
|-
 
|eventDetail
 
|none
 
|%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
 
|
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|002
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|message digest calculation
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2010-08-01T09:08:46-01:00
 
|mandatory unit and component
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierType
 
|repository ID
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierValue
 
|CVA
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|linkingObjectIdentifier
 
|linkingObjectIdentifierType
 
|UUID
 
|used to link an object to an event; not mandatory but recommended
 
|-
 
|linkingObjectIdentifier
 
|linkingObjectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|used to link an object to an event; not mandatory but recommended
 
|-
 
|agentIdentifier
 
|agentIdentifierType
 
|repository code
 
|mandatory unit and component
 
|-
 
|agentIdentifier
 
|agentIdentifierValue
 
|CVA
 
|mandatory unit and component
 
|-
 
|agentName
 
|none
 
|City of Vancouver Archives
 
|
 
|-
 
|agentType
 
|none
 
|organization
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
==Events requiring metadata==
 
 
 
===Receive SIP (SIP gets placed in  1-receiveSIP)===
 
 
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|user account
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|demo
 
|Y
 
|-
 
|3.1.1 agentIdentifierType
 
|workstation id
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|archivematica-1
 
|Y
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
 
 
===Check checksums===
 
 
 
Metadata for each file in the SIP
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|software
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|MD5sum
 
|Y
 
|
 
|-
 
|2.5.1 eventOutcome
 
|Pass; fail
 
|Y
 
|
 
|-
 
|2.5.2 eventOutcomeDetail
 
|j6059_02.wav FAILED
 
|Y
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
===Generate checksums===
 
 
 
Metadata for each file in the SIP for which a checksum is generated by Archivematica
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|software
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|MD5sum
 
|Y
 
|
 
|-
 
|1.5.2.1 messageDigestAlgorithm
 
|MD5
 
|Y
 
|
 
|-
 
|1.5.2.2 messageDigest
 
|fa10ee76a575bafe43335abf6cd60bae
 
|Y
 
|
 
|-
 
|1.5.2.3 messageDigestOriginator
 
|City of Vancouver
 
|Y
 
|
 
|}
 
 
 
 
 
<br>
 
===Review SIP===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|user account
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|demo
 
|Y
 
|-
 
|2.5.1 eventOutcome {pass; conditional pass}
 
|{pass; conditional pass}
 
|N
 
|If it fails, it doesn't move on to become an AIP, so failure is not an option
 
|-
 
|2.5.2 eventOutcomeDetail Some files missing; appraisal required
 
|Some files missing; appraisal required
 
|N
 
|This field is mandatory if eventOutcome = conditional pass
 
|-
 
|}
 
 
 
<br>
 
 
 
===Quarantine SIP===
 
-when it went in and when it came out
 
 
 
===Unpack zipped files===
 
-tool used, time unpacked, event outcome (successful?), map of zipped file to unzipped contents (map for each unzipped file + link to event)
 
 
 
===Assign UUIDs===
 
-the usual stuff, map from original name to UUID
 
 
 
==Remove prohibited characters===
 
-the usual stuff, map from original name to sanitized name
 
 
 
==Virus scan==
 
-the usual stuff, result for each file (include eventOutcomeDetail to describe type of fail such as the type of malware found)
 
 
 
==File characterization==
 
-identification: format name, format version, registry name, registry key
 
-validation: well formed? Valid?
 
 
 
==Appraise SIP==
 
-usual event stuff
 
-event outcome (no files removed; some files removed)
 
-list of files removed
 
 
 
==Normalization to preservation formats==
 
-everything already in the table plus identification information: format name, format version, registry name, registry key
 
 
 
==Normalization to access formats==
 
 
 
<br>
 
 
 
==Mandatory PREMIS elements (mandatory semantic units + mandatory components)==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Present in Archivematica?'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|No
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|Yes
 
|-
 
|Object
 
|1.2 objectCategory
 
|none
 
|No
 
|-
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.1 Composition level
 
|No
 
|-
 
|Object
 
|1.5.4 objectCharacteristics/format
 
|Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
 
|
 
*1.5.4.1.1 formatName Yes
 
*1.5.4.2.1 formatRegistryName No
 
*1.5.4.2.2 formatRegistryKey Yes
 
|-
 
|Object
 
|1.7 Storage
 
|Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded."
 
|No, but retrieval may be managed through UUIDs.
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|No
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|No
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Partial
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|Partial
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|No
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|No
 
|}
 
 
 
<br>
 
 
 
==PREMIS elements relating to derived objects==
 
 
 
Since AIPs are constructed from both original and normalized files, we need to determine what PREMIS elements should be used to describe the normalized files and their relationship to the originals.
 
 
 
===Original file metadata===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|is source of
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the normalized file)
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Normalization
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|2010:05:19 00:49:15+00:00
 
|-
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|Processing completed
 
|-
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|Output #0, wav, to '/tmp/MultimediaSIP-9ece5881-640e-4bdc-9863-4ff50046a0bd/objects/sample.wav': Stream #0.0: Audio: pcm_s16le, 8000 Hz, stereo, s16, 256 kb/s
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|
 
|-
 
|Agent
 
|3.2 agentName
 
|none
 
|FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2
 
|-
 
|}
 
 
 
<br>
 
 
 
===Normalized file metadata===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|has source
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the original file)
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.1 relatedEventIdentifierType
 
|
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.2 relatedEventIdentifierValue
 
|
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.1 messageDigestAlgorithm
 
|MD5
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.2 messageDigest
 
|537e0206ae83f815e4fg5f28464f6rt7
 
|-
 
|}
 
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]
 
 
__NOTOC__
 

Latest revision as of 16:34, 11 February 2020

Main Page > Development > Development documentation > Metadata elements

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.