Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
 
(60 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
  
This process involves:
+
*[[METS]]
 
+
*[[PREMIS]]
# Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
+
*[[PREMIS metadata: original files]]
# Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see [[Existing elements]])
+
*[[PREMIS metadata: normalized files]]
# Comparing 1) to 2) in order to determine what gaps exist in Archivematica
+
*[[PREMIS metadata: events]]
# Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
+
*[[PREMIS metadata: rights - 0.10]]
# Structuring the required elements into the [http://wiki.fcla.edu:8000/TIPR/21 Repository eXchange Package (RXP) specification]
+
*[[PREMIS/METS for scalability]]
# Determining what metadata belongs in the DIP(s)
+
*[[RDF/OWL]]
 
 
 
 
<br>
 
 
 
==Proposed PREMIS metadata for original file==
 
 
 
This table is a template for metadata elements for the original file. Please note the following:
 
*The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
 
*For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
 
*The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
 
 
 
<br>
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|objectIdentifier
 
|objectIdentifierType
 
|UUID
 
|mandatory unit and component
 
|-
 
|objectIdentifier
 
|objectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|mandatory unit and component
 
|-
 
|objectCategory
 
|none
 
|file
 
|mandatory unit and component
 
|-
 
|significantProperties
 
|significantPropertiesType
 
|ImageWidth
 
|repeatable semantic unit
 
|-
 
|significantProperties
 
|significantPropertiesValue
 
|1024
 
|repeatable semantic unit
 
|-
 
|objectCharacteristics
 
|compositionLevel
 
|0
 
|mandatory unit and component
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestAlgorithm
 
|MD5
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigest
 
|e479688508922354bdab09bca60d8d0e
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestOriginator
 
|City of Vancouver
 
|
 
|-
 
|objectCharacteristics
 
|size
 
|787510
 
|
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatName
 
|Windows Bitmap
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatVersion
 
|3.0
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryName
 
|PRONOM
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryKey
 
|fmt/116
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|relationship
 
|relationshipType
 
|derivation
 
|
 
|-
 
|relationship
 
|relationshipSubType
 
|is source of
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierType
 
|UUID
 
|mandatory unit and component if there is a related object
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|mandatory unit and component if there is a related object
 
|-
 
|relationship/relatedEventIdentification
 
|relatedEventIdentifierType
 
|Archivematica ID
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|-
 
|relationship/relatedEventIdentification
 
|relatedEventIdentifierValue
 
|006
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|006
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|normalization
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2009-12-01T09:09:00-02:00
 
|mandatory unit and component
 
|-
 
|eventDetail
 
|none
 
|program=ImageMagick; version=6.6.4.0
 
|This element can be used to record information about software used and eliminates the need to have agent entities for software programs
 
|-
 
|eventOutcomeInformation
 
|eventOutcome
 
|normalization successful
 
|
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierType
 
|repository ID
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierValue
 
|CVA
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|agentIdentifier
 
|agentIdentifierType
 
|repository code
 
|mandatory unit and component
 
|-
 
|agentIdentifier
 
|agentIdentifierValue
 
|CVA
 
|mandatory unit and component
 
|-
 
|agentName
 
|none
 
|City of Vancouver Archives
 
|
 
|-
 
|agentType
 
|none
 
|organization
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
==Proposed PREMIS metadata for normalized file (preservation copy)==
 
Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|objectIdentifier
 
|objectIdentifierType
 
|UUID
 
|mandatory unit and component
 
|-
 
|objectIdentifier
 
|objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|mandatory unit and component
 
|-
 
|objectCategory
 
|none
 
|file
 
|mandatory unit and component
 
|-
 
|objectCharacteristics
 
|compositionLevel
 
|0
 
|mandatory unit and component
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestAlgorithm
 
|MD5
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigest
 
|e479688508922354bdab09bca60d8d0e
 
|
 
|-
 
|objectCharacteristics/fixity
 
|messageDigestOriginator
 
|City of Vancouver
 
|
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatName
 
|Tagged Image File Format
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatDesignation
 
|formatVersion
 
|6.0
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryName
 
|PRONOM
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|objectCharacteristics/format/formatRegistry
 
|formatRegistryKey
 
|fmt/10
 
|format is a mandatory unit; must use either formatDesignation or formatRegistry
 
|-
 
|relationship
 
|relationshipType
 
|derivation
 
|
 
|-
 
|relationship
 
|relationshipSubType
 
|has source
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierType
 
|UUID
 
|
 
|-
 
|relationship/relatedObjectIdentification
 
|relatedObjectIdentifierValue
 
|0db50321-6d7b-4291-89ec-a8b0adc1ff96
 
|
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|006
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|creation
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2010-08-01T09:08:44-03:00
 
|mandatory unit and component
 
|-
 
|eventDetail
 
|none
 
|program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
 
|
 
|-
 
|eventIdentifier
 
|eventIdentifierType
 
|Archivematica ID
 
|mandatory unit and component
 
|-
 
|eventIdentifier
 
|eventIdentifierValue
 
|002
 
|mandatory unit and component
 
|-
 
|eventType
 
|none
 
|message digest calculation
 
|mandatory unit and component
 
|-
 
|eventDateTime
 
|none
 
|2010-08-01T09:08:46-01:00
 
|mandatory unit and component
 
|-
 
|eventDetail
 
|none
 
|program="MD5deep"; version="3.6"
 
|
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierType
 
|repository ID
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|linkingAgentIdentifier
 
|linkingAgentIdentifierValue
 
|CVA
 
|used to link an agent to an event; not mandatory but recommended
 
|-
 
|agentIdentifier
 
|agentIdentifierType
 
|repository code
 
|mandatory unit and component
 
|-
 
|agentIdentifier
 
|agentIdentifierValue
 
|CVA
 
|mandatory unit and component
 
|-
 
|agentName
 
|none
 
|City of Vancouver Archives
 
|
 
|-
 
|agentType
 
|none
 
|organization
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
==Events requiring metadata==
 
 
 
===Receive SIP (SIP gets placed in  1-receiveSIP)===
 
 
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|user account
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|demo
 
|Y
 
|-
 
|3.1.1 agentIdentifierType
 
|workstation id
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|archivematica-1
 
|Y
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
 
 
===Check checksums===
 
 
 
Metadata for each file in the SIP
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|software
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|MD5sum
 
|Y
 
|
 
|-
 
|2.5.1 eventOutcome
 
|Pass; fail
 
|Y
 
|
 
|-
 
|2.5.2 eventOutcomeDetail
 
|j6059_02.wav FAILED
 
|Y
 
|
 
|-
 
|}
 
 
 
<br>
 
 
 
===Generate checksums===
 
 
 
Metadata for each file in the SIP for which a checksum is generated by Archivematica
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|software
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|MD5sum
 
|Y
 
|
 
|-
 
|1.5.2.1 messageDigestAlgorithm
 
|MD5
 
|Y
 
|
 
|-
 
|1.5.2.2 messageDigest
 
|fa10ee76a575bafe43335abf6cd60bae
 
|Y
 
|
 
|-
 
|1.5.2.3 messageDigestOriginator
 
|City of Vancouver
 
|Y
 
|
 
|}
 
 
 
 
 
<br>
 
===Review SIP===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:10%"|'''Automated?'''
 
!style="width:20%"|'''Notes'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|Y
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|Y
 
|
 
|-
 
|2.2 eventType
 
|
 
|Y
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|Y
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|user account
 
|Y
 
|
 
|-
 
|3.1.2 agentIdentifierValue
 
|demo
 
|Y
 
|-
 
|2.5.1 eventOutcome {pass; conditional pass}
 
|{pass; conditional pass}
 
|N
 
|If it fails, it doesn't move on to become an AIP, so failure is not an option
 
|-
 
|2.5.2 eventOutcomeDetail Some files missing; appraisal required
 
|Some files missing; appraisal required
 
|N
 
|This field is mandatory if eventOutcome = conditional pass
 
|-
 
|}
 
 
 
<br>
 
 
 
===Quarantine SIP===
 
-when it went in and when it came out
 
 
 
===Unpack zipped files===
 
-tool used, time unpacked, event outcome (successful?), map of zipped file to unzipped contents (map for each unzipped file + link to event)
 
 
 
===Assign UUIDs===
 
-the usual stuff, map from original name to UUID
 
 
 
==Remove prohibited characters===
 
-the usual stuff, map from original name to sanitized name
 
 
 
==Virus scan==
 
-the usual stuff, result for each file (include eventOutcomeDetail to describe type of fail such as the type of malware found)
 
 
 
==File characterization==
 
-identification: format name, format version, registry name, registry key
 
-validation: well formed? Valid?
 
 
 
==Appraise SIP==
 
-usual event stuff
 
-event outcome (no files removed; some files removed)
 
-list of files removed
 
 
 
==Normalization to preservation formats==
 
-everything already in the table plus identification information: format name, format version, registry name, registry key
 
 
 
==Normalization to access formats==
 
 
 
<br>
 
 
 
==Mandatory PREMIS elements (mandatory semantic units + mandatory components)==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Present in Archivematica?'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|No
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|Yes
 
|-
 
|Object
 
|1.2 objectCategory
 
|none
 
|No
 
|-
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.1 Composition level
 
|No
 
|-
 
|Object
 
|1.5.4 objectCharacteristics/format
 
|Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
 
|
 
*1.5.4.1.1 formatName Yes
 
*1.5.4.2.1 formatRegistryName No
 
*1.5.4.2.2 formatRegistryKey Yes
 
|-
 
|Object
 
|1.7 Storage
 
|Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded."
 
|No, but retrieval may be managed through UUIDs.
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|No
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|No
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Partial
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|Partial
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|No
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|No
 
|}
 
 
 
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]
 
 
__NOTOC__
 

Latest revision as of 16:34, 11 February 2020

Main Page > Development > Development documentation > Metadata elements

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.