Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
 
(95 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Metadata elements
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
 
This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.
  
This process involves:
+
*[[METS]]
 
+
*[[PREMIS]]
# Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
+
*[[PREMIS metadata: original files]]
# Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements
+
*[[PREMIS metadata: normalized files]]
# Comparing 1) to 2) in order to determine what gaps exist in Archivematica
+
*[[PREMIS metadata: events]]
# Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
+
*[[PREMIS metadata: rights - 0.10]]
# Structuring the required elements into the [http://wiki.fcla.edu:8000/TIPR/21 Repository eXchange Package (RXP) specification]
+
*[[PREMIS/METS for scalability]]
# Determining what metadata belongs in the DIP(s)
+
*[[RDF/OWL]]
 
 
==Map of Archivematica 0.6 metadata to PREMIS elements==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/MD5checksum.txt'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when quarantine period expires. Provides checksums for each object in the SIP. Note that if zipped files are present, a checksum is generated for the zipped file and not for each object within it.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|326e0206ae83f815e4be5f28464f6ac6
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/filenameCleanup.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when quarantine period expires, prior to unpacking of any zipped files. If prohibited characters were present in filenames, provides crosswalk between original and "cleaned up" filenames.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Original filename
 
|Object
 
|1.6 originalName
 
|none
 
|Syllabus final.doc
 
|-
 
|Cleaned-up filename
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|Syllabus_final.doc
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/virusScan.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when ingested files are scanned for viruses and malware'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Scan result
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|OK
 
|-
 
|colspan="6" style="background-color:silver;"|'''Source: /data/logs/fileUUIDs.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced after prohibited characters are removed from filenames and any zipped files have been unpacked. Provides a crosswalk between cleaned-up filenames and UUIDs.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Universal unique identifier (UUID)
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|270bd067-0483-4c5f-bdec-f2cbd6e651aa
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/FITS-[UUID]-[SIP].xml (FITS output reports)'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced when FITS tool identifies and validates formats and extracts technical metadata
 
|-
 
!style="width:20%"|FITS element
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|format
 
|Object
 
|1.5.4.1 formatDesignation
 
|1.5.4.1.1 formatName
 
|
 
*Tagged Image File Format
 
*Waveform Audio
 
*Microsoft Powerpoint Presentation
 
|-
 
|version
 
|Object
 
|1.5.4.1 formatDesignation
 
|1.5.4.1.2 formatVersion
 
|6.0
 
|-
 
|externalIdentifier
 
|Object
 
|1.5.4.2 formatRegistry
 
|1.5.4.2.2 formatRegistryKey
 
|fmt/10
 
|-
 
|Size
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.3 size
 
|125968
 
|-
 
|ImageWidth (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2464
 
|-
 
|ImageHeight (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|3248
 
|-
 
|SamplesPerPixel (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|3
 
|-
 
|XResolution (image files and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|300
 
|-
 
|YResolution (image and video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|300
 
|-
 
|duration (audio files and video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|0:2:26:16
 
|-
 
|bitDepth/bitsPerSample (image files, audio files, video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|16
 
|-
 
|sampleRate (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|48000.0
 
|-
 
|channels (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2
 
|-
 
|aes:channelAssignment (audio files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|
 
*channelNum="0" mapLocation="LEFT"
 
*channelNum="1" mapLocation="RIGHT"
 
|-
 
|VideoFrameRate (video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|
 
*30.0
 
*29.97 fps
 
|-
 
|AspectRatio (video streams)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|1:1
 
|-
 
|AudioFormat (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|raw
 
|-
 
|AudioChannels (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|2
 
|-
 
|AudioBitsPerSample (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|8
 
|-
 
|AudioSampleRate (audio streams in video files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|44100
 
|-
 
|PageCount (text files, office documents, pdf files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|16
 
|-
 
|WordCount (text files, office documents)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|876
 
|-
 
|Paragraphs (text files, office documents)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|19
 
|-
 
|Slides (presentation files)
 
|Object
 
|1.4 significantProperties
 
|1.4.2 significantPropertiesValue
 
|27
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/normalization.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced during normalization to preservation and access formats'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Name of normalization tool
 
|Agent
 
|3.2 agentName
 
|none
 
|FFmpeg version SVN-r19352-4:0.5+svn20090706-2ubuntu2.2
 
|-
 
|Event description
 
|Event
 
|2.2 eventType
 
|none
 
|Normalizing
 
|-
 
|Processing status
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|Processing completed
 
|-
 
|Normalization result
 
|Event
 
|2.5.2 eventOutcomeDetail
 
|2.5.2.1 eventOutcomeDetailNote
 
|
 
*Already in preservation format. No need to normalize.
 
*No default normalization tool defined.
 
*Output #0, wav, to '/tmp/MultimediaSIP-9ece5881-640e-4bdc-9863-4ff50046a0bd/objects/sample.wav': Stream #0.0: Audio: pcm_s16le, 8000 Hz, stereo, s16, 256 kb/s
 
|-
 
|colspan="6" style="background-color:silver;"|'''Source: /data/logs/MD5checksum.txtprepareAIP_check.log'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced after file normalization process. Checks that checksums for files in the SIP have not changed during normalization.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|Pass/fail notification
 
|Event
 
|2.5 eventOutcomeInformation
 
|2.5.1 eventOutcome
 
|
 
*PASSED
 
*FAILED
 
|-
 
| colspan="6" style="background-color:silver;"|'''Source: /data/logs/AIP.MD5checksum.txt'''
 
|-
 
|colspan="6" style="background-color:#E0FFFF;"|'''Process: Produced during BagIt process. Provides checksums for the AIP and for each original and normalized file in the AIP.'''
 
|-
 
!style="width:20%"|Description
 
!style="width:15%"|PREMIS entity
 
!style="width:15%"|PREMIS semantic unit
 
!style="width:15%"|PREMIS semantic component
 
!style="width:35%"|Sample value(s)
 
|-
 
|AIP checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|12b86e038bf0bddd5aba110c35f288b8
 
|-
 
|File checksum
 
|Object
 
|1.5.2 Fixity
 
|1.5.2.2 messageDigest
 
|326e0206ae83f815e4be5f28464f6ac6
 
|-
 
|}
 
<br>
 
 
 
 
 
==Mandatory PREMIS elements (mandatory semantic units + mandatory components)==
 
 
 
{| border="1" cellpadding="10" cellspacing="0" width=90%
 
|-
 
|- style="background-color:#cccccc;"
 
!style="width:10%"|'''Entity'''
 
!style="width:20%"|'''Semantic unit'''
 
!style="width:20%"|'''Semantic component'''
 
!style="width:20%"|'''Present in Archivematica?'''
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.1 objectIdentifierType
 
|No
 
|-
 
|Object
 
|1.1 objectIdentifier
 
|1.1.2 objectIdentifierValue
 
|Yes
 
|-
 
|Object
 
|1.2 objectCategory
 
|none
 
|No
 
|-
 
|Object
 
|1.5 objectCharacteristics
 
|1.5.1 Composition level
 
|No
 
|-
 
|Object
 
|1.5.4 objectCharacteristics/format
 
|Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
 
|
 
*1.5.4.1.1 formatName Yes
 
*1.5.4.2.1 formatRegistryName No
 
*1.5.4.2.2 formatRegistryKey Yes
 
|-
 
|Object
 
|1.7 Storage
 
|Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded."
 
|No, but retrieval may be managed through UUIDs.
 
|-
 
||Event
 
|2.1 eventIdentifer
 
|2.1.1 eventIdentifierType
 
|No
 
|-
 
|Event
 
|2.1 eventIdentifer
 
|2.1.2 eventIdentifierValue
 
|No
 
|-
 
|Event
 
|2.2 eventType
 
|none
 
|Partial
 
|-
 
|Event
 
|2.3 eventDateTime
 
|none
 
|Partial
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.1 agentIdentifierType
 
|No
 
|-
 
|Agent
 
|3.1 agentIdentifier
 
|3.1.2 agentIdentifierValue
 
|No
 
|}
 
 
 
<br>
 
 
 
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]

Latest revision as of 16:34, 11 February 2020

Main Page > Development > Development documentation > Metadata elements

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.