Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
 
(88 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
  
This process involves:
+
*[[METS]]
 
+
*[[PREMIS]]
# Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
+
*[[PREMIS metadata: original files]]
# Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements
+
*[[PREMIS metadata: normalized files]]
# Comparing 1) to 2) in order to determine what gaps exist in Archivematica
+
*[[PREMIS metadata: events]]
# Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
+
*[[PREMIS metadata: rights - 0.10]]
# Structuring the required elements into the [http://wiki.fcla.edu:8000/TIPR/21 Repository eXchange Package (RXP) specification]
+
*[[PREMIS/METS for scalability]]
# Determining what metadata belongs in the DIP(s)
+
*[[RDF/OWL]]
 
 
==Map of Archivematica 0.6 metadata to PREMIS elements==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/MD5checksum.txt'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when quarantine period expires. Provides checksums for each object in the SIP. Note that if zipped files are present, a checksum is generated for the zipped file and not for each object within it.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|326e0206ae83f815e4be5f28464f6ac6
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/filenameCleanup.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when quarantine period expires, prior to unpacking of any zipped files. If prohibited characters were present in filenames, provides crosswalk between original and "cleaned up" filenames.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Original filename
 
|Object
 
|1.6 originalName
 
|none
 
|Syllabus final.doc
 
|-
 
|Cleaned-up filename
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|Syllabus_final.doc
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/virusScan.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when ingested files are scanned for viruses and malware'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Scan result
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|OK
 
|-
 
|colspan="6" style="background-color:silver;"|'''Source: /data/logs/fileUUIDs.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced after prohibited characters are removed from filenames and any zipped files have been unpacked. Provides a crosswalk between cleaned-up filenames and UUIDs.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Universal unique identifier (UUID)
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/FITS-[UUID]-[SIP].xml (FITS output reports)'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when FITS tool identifies and validates formats and extracts technical metadata
 
|-
 
!style="width:20%"|FITS element
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|format
 
|Object
 
|1.5.4.1 formatDesignation
 
|1.5.4.1.1 formatName
 
|
 
*Tagged Image File Format
 
*Waveform Audio
 
*Microsoft Powerpoint Presentation
 
|-
 
|version
 
|Object
 
|1.5.4.1 formatDesignation
 
|1.5.4.1.2 formatVersion
 
|6.0
 
|-
 
|externalIdentifier
 
|Object
 
|1.5.4.2 formatRegistry
 
|1.5.4.2.2 formatRegistryKey
 
|fmt/10
 
|-
 
|Size
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.3 size
 
|125968
 
|-
 
|ImageWidth (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2464
 
|-
 
|ImageHeight (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|3248
 
|-
 
|SamplesPerPixel (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|3
 
|-
 
|XResolution (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|300
 
|-
 
|YResolution (image and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|300
 
|-
 
|duration (audio files and video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|0:2:26:16
 
|-
 
|bitDepth/bitsPerSample (image files, audio files, video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|16
 
|-
 
|sampleRate (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|48000.0
 
|-
 
|channels (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2
 
|-
 
|aes:channelAssignment (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|
 
*channelNum="0" mapLocation="LEFT"
 
*channelNum="1" mapLocation="RIGHT"
 
|-
 
|VideoFrameRate (video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|
 
*30.0
 
*29.97 fps
 
|-
 
|AspectRatio (video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|1:1
 
|-
 
|AudioFormat (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|raw
 
|-
 
|AudioChannels (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2
 
|-
 
|AudioBitsPerSample (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|8
 
|-
 
|AudioSampleRate (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|44100
 
|-
 
|PageCount (text files, office documents, pdf files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|16
 
|-
 
|WordCount (text files, office documents)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|876
 
|-
 
|Paragraphs (text files, office documents)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|19
 
|-
 
|Slides (presentation files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|27
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/normalization.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced during normalization to preservation and access formats'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Name of normalization tool
 
|Agent
 
|3.2 agentName
 
|none
 
|FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2
 
|-
 
|Event description
 
|Event
 
|2.2 eventType
 
|none
 
|Normalizing
 
|-
 
|Processing status
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|Processing completed
 
|-
 
|Normalization result
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|
 
*Already in preservation format. No need to normalize.
 
*No default normalization tool defined.
 
*Output #0, wav, to '/tmp/MultimediaSIP-9ece5881-640e-4bdc-9863-4ff50046a0bd/objects/sample.wav': Stream #0.0: Audio: pcm_s16le, 8000 Hz, stereo, s16, 256 kb/s
 
|-
 
|colspan="6" style="background-color:silver;"|'''Source: /data/logs/MD5checksum.txtprepareAIP_check.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced after file normalization process. Checks that checksums for files in the SIP have not changed during normalization.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Pass/fail notification
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|
 
*PASSED
 
*FAILED
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/AIP.MD5checksum.txt'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced during BagIt process. Provides checksums for the AIP and for each original and normalized file in the AIP.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|AIP checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|12b86e038bf0bddd5aba110c35f288b8
 
|-
 
|File checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|326e0206ae83f815e4be5f28464f6ac6
 
|-
 
|}
 
<br>
 
 
 
==Events requiring metadata elements==
 
 
 
===Transfer===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Semantic component'''
 
!style="width:20%"|'''Sample value(s)'''
 
!style="width:20%"|'''Automated?'''
 
|-
 
|2.1.1 eventIdentifierType
 
|
 
|
 
|-
 
|2.1.2 eventIdentifierValue
 
|
 
|
 
|-
 
|2.2 eventType
 
|
 
|
 
|-
 
|2.3 eventDateTime
 
|
 
|
 
|-
 
|3.1.1 agentIdentifierType
 
|Transferring entity
 
|Y
 
|-
 
|3.1.2 agentIdentifierValue
 
|
 
|
 
|-
 
|}
 
 
 
#Accession
 
#Receive SIP (SIP gets placed 1-receiveSIP)
 
#*2.1.1 eventIdentifierType
 
#*2.1.2 eventIdentifierValue
 
#*2.2 eventType ingestion
 
#*2.3 eventDateTime
 
#*3.1.1 agentIdentifierType user account
 
#*3.1.2 agentIdentifierValue demo
 
#*3.1.1 agentIdentifierType workstation id
 
#*3.1.2 agentIdentifierValue archivematica-1
 
#Review SIP
 
#*2.1.1 eventIdentifierType
 
#*2.1.2 eventIdentifierValue
 
#*2.2 eventType
 
#*2.3 eventDateTime
 
#*3.1.1 agentIdentifierType user account
 
#*3.1.2 agentIdentifierValue demo
 
#*2.5.1 eventOutcome {pass; conditional pass} [if it fails, it doesn't move on to become an AIP, so failure is not an option]
 
#*2.5.2 eventOutcomeDetail Some files missing; appraisal required [This field is mandatory if eventOutcome = conditional pass]
 
 
 
 
 
 
 
 
 
 
 
 
 
==Mandatory PREMIS elements (mandatory semantic units + mandatory components)==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Present in Archivematica?'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|No
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|Yes
 
|-
 
|Object
 
|1.2 objectCategory
 
|none
 
|No
 
|-
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.1 Composition level
 
|No
 
|-
 
|Object
 
|1.5.4 objectCharacteristics/format
 
|Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
 
|
 
*1.5.4.1.1 formatName Yes
 
*1.5.4.2.1 formatRegistryName No
 
*1.5.4.2.2 formatRegistryKey Yes
 
|-
 
|Object
 
|1.7 Storage
 
|Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded."
 
|No, but retrieval may be managed through UUIDs.
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|No
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|No
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Partial
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|Partial
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|No
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|No
 
|}
 
 
 
<br>
 
 
 
==PREMIS elements relating to derived objects==
 
 
 
Since AIPs are constructed from both original and normalized files, we need to determine what PREMIS elements should be used to describe the normalized files and their relationship to the originals.
 
 
 
===Original file metadata===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|is source of
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the normalized file)
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Normalization
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|2010:05:19 00:49:15+00:00
 
|-
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|Processing completed
 
|-
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|Output #0, wav, to '/tmp/MultimediaSIP-9ece5881-640e-4bdc-9863-4ff50046a0bd/objects/sample.wav': Stream #0.0: Audio: pcm_s16le, 8000 Hz, stereo, s16, 256 kb/s
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|
 
|-
 
|Agent
 
|3.2 agentName
 
|none
 
|FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2
 
|-
 
|}
 
 
 
<br>
 
 
 
===Normalized file metadata===
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Example'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.1 relationship type
 
|derivation
 
|-
 
|Object
 
|1.10 relationship
 
|1.10.2 relationshipSubType
 
|has source
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.1 relatedObjectIdentifierType
 
|UUID
 
|-
 
|Object
 
|1.10.3 relatedObjectIdentification
 
|1.10.3.2 relatedObjectIdentifierValue
 
|(UUID of the original file)
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.1 relatedEventIdentifierType
 
|
 
|-
 
|Object
 
|1.10.4 relatedEventIdentification
 
|1.10.4.2 relatedEventIdentifierValue
 
|
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.1 messageDigestAlgorithm
 
|MD5
 
|-
 
|Object
 
|1.5.2 fixity
 
|1.5.2.2 messageDigest
 
|537e0206ae83f815e4fg5f28464f6rt7
 
|-
 
|}
 
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]
 
 
__NOTOC__
 

Latest revision as of 16:34, 11 February 2020

Main Page > Development > Development documentation > Metadata elements

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.