Difference between revisions of "Metadata elements"

From Archivematica
Jump to navigation Jump to search
Line 137: Line 137:
 
|relationship/relatedEventIdentification
 
|relationship/relatedEventIdentification
 
|relatedEventIdentifierValue
 
|relatedEventIdentifierValue
|006
+
|[alphanumeric code]
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|"For derivative relationships between objects relatedEventIdentification must be recorded."
 
|-
 
|-
Line 147: Line 147:
 
|eventIdentifier
 
|eventIdentifier
 
|eventIdentifierValue
 
|eventIdentifierValue
|006
+
|[alphanumeric code]
 
|mandatory unit and component
 
|mandatory unit and component
 
|-
 
|-
Line 187: Line 187:
 
|agentIdentifier
 
|agentIdentifier
 
|agentIdentifierType
 
|agentIdentifierType
|repository code
+
|repository ID
 
|mandatory unit and component
 
|mandatory unit and component
 
|-
 
|-
Line 322: Line 322:
 
|eventIdentifier
 
|eventIdentifier
 
|eventIdentifierValue
 
|eventIdentifierValue
|006
+
|[alphanumeric code]
 
|mandatory unit and component
 
|mandatory unit and component
 
|-
 
|-
Line 347: Line 347:
 
|eventIdentifier
 
|eventIdentifier
 
|eventIdentifierValue
 
|eventIdentifierValue
|002
+
|[alphanumeric code]
 
|mandatory unit and component
 
|mandatory unit and component
 
|-
 
|-
Line 1,206: Line 1,206:
 
|eventDetail
 
|eventDetail
 
|none
 
|none
|program="ImageMagick"; version="6.6.4.0"; coomand="%convertPath% %fileFullName% %accessFileDirectory%%fileTitle%.%accessFormat%"
+
|program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% %accessFileDirectory%%fileTitle%.%accessFormat%"
 
|
 
|
 
|-
 
|-

Revision as of 16:49, 9 September 2010

Main Page > Development > Development documentation > Metadata elements

This page identifies a minimum set of metadata elements designed to ensure authenticity and interoperability of preserved objects and to facilitate their retrieval.

This process involves:

  1. Using the InterPARES Chain of Preservation (COP) model and the CoP/PREMIS crosswalk to identify required elements for objects preserved by Archivematica
  2. Analyzing existing metadata in the Archivematica AIP log files and METS.xml file in order to map them to METS and PREMIS elements (see Existing elements)
  3. Comparing 1) to 2) in order to determine what gaps exist in Archivematica
  4. Filling in the gaps - eg by modifying workflow to produce and/or capture missing elements
  5. Structuring the required elements into the Repository eXchange Package (RXP) specification
  6. Determining what metadata belongs in the DIP(s)



Proposed PREMIS metadata for original file

This table is a template for metadata elements for the original file. Please note the following:

  • The significantProperties semantic unit would be repeated as needed to capture all the significant property data produced by FITS.
  • The format semantic unit would be repeated as needed if FITS identified several possible formats for the file
  • The eventOutcomeDetail semantic unit would be repeated as needed to capture detailed information generated by an event
  • For most files, the relationships semantic unit would be used twice: once for the preservation copy and once for the access copy
  • The event elements included in this table are an example only; a real PREMIS file would contain information about numerous events
  • There will be at least two agent entities: an organization, such as City of Vancouver Archives, and Archivematica. The organization would be the agent for manual events such as reviewing the SIP, while Archivematica would be the agent for automated events such as normalization.


Semantic unit Semantic component Sample value(s) Notes
objectIdentifier objectIdentifierType UUID mandatory unit and component
objectIdentifier objectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96 mandatory unit and component
objectCategory none file mandatory unit and component
significantProperties significantPropertiesType ImageWidth repeatable semantic unit
significantProperties significantPropertiesValue 1024 repeatable semantic unit
objectCharacteristics compositionLevel 0 mandatory unit and component
objectCharacteristics/fixity messageDigestAlgorithm MD5
objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
objectCharacteristics/fixity messageDigestOriginator City of Vancouver Archives
objectCharacteristics size 787510
objectCharacteristics/format/formatDesignation formatName Windows Bitmap format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatDesignation formatVersion 3.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryKey fmt/116 format is a mandatory unit; must use either formatDesignation or formatRegistry
originalName none /SAE Project files/newsletters/20100223/cover image.bmp
relationship relationshipType derivation
relationship relationshipSubType is source of
relationship/relatedObjectIdentification relatedObjectIdentifierType UUID mandatory unit and component if there is a related object
relationship/relatedObjectIdentification relatedObjectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component if there is a related object
relationship/relatedEventIdentification relatedEventIdentifierType Archivematica ID "For derivative relationships between objects relatedEventIdentification must be recorded."
relationship/relatedEventIdentification relatedEventIdentifierValue [alphanumeric code] "For derivative relationships between objects relatedEventIdentification must be recorded."
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue [alphanumeric code] mandatory unit and component
eventType none normalization mandatory unit and component
eventDateTime none 2009-12-01T09:09:00-02:00 mandatory unit and component
eventDetail none program=ImageMagick; version=6.6.4.0 This element can be used to record information about software used and eliminates the need to have agent entities for software programs
eventOutcomeInformation eventOutcome normalization successful
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system used to link an agent to an event; not mandatory but recommended
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6 used to link an agent to an event; not mandatory but recommended
agentIdentifier agentIdentifierType repository ID mandatory unit and component
agentIdentifier agentIdentifierValue CVA mandatory unit and component
agentName none City of Vancouver Archives
agentType none organization
agentIdentifier agentIdentifierType preservation system mandatory unit and component
agentIdentifier agentIdentifierValue Archivematica-0.6 mandatory unit and component
agentName none Archivematica
agentType none software


Proposed PREMIS metadata for normalized file (preservation copy)

Unlike the table above, this table shows all the metadata elements that should appear for a normalized file. The two events recorded are creation and checksum generation.

Semantic unit Semantic component Sample value(s) Notes
objectIdentifier objectIdentifierType UUID mandatory unit and component
objectIdentifier objectIdentifierValue 270bd067-0483-4c5f-bdec-f2cbd6e651aa mandatory unit and component
objectCategory none file mandatory unit and component
objectCharacteristics compositionLevel 0 mandatory unit and component
objectCharacteristics/fixity messageDigestAlgorithm MD5
objectCharacteristics/fixity messageDigest e479688508922354bdab09bca60d8d0e
objectCharacteristics/fixity messageDigestOriginator City of Vancouver Archives
objectCharacteristics/format/formatDesignation formatName Tagged Image File Format format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatDesignation formatVersion 6.0 format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryName PRONOM format is a mandatory unit; must use either formatDesignation or formatRegistry
objectCharacteristics/format/formatRegistry formatRegistryKey fmt/10 format is a mandatory unit; must use either formatDesignation or formatRegistry
relationship relationshipType derivation
relationship relationshipSubType has source
relationship/relatedObjectIdentification relatedObjectIdentifierType UUID
relationship/relatedObjectIdentification relatedObjectIdentifierValue 0db50321-6d7b-4291-89ec-a8b0adc1ff96
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue [alphanumeric code] mandatory unit and component
eventType none creation mandatory unit and component
eventDateTime none 2010-08-01T09:08:44-03:00 mandatory unit and component
eventDetail none program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
eventIdentifier eventIdentifierType Archivematica ID mandatory unit and component
eventIdentifier eventIdentifierValue [alphanumeric code] mandatory unit and component
eventType none message digest calculation mandatory unit and component
eventDateTime none 2010-08-01T09:08:46-01:00 mandatory unit and component
eventDetail none program="MD5deep"; version="3.6"
linkingAgentIdentifier linkingAgentIdentifierType preservation system used to link an agent to an event; not mandatory but recommended
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6 used to link an agent to an event; not mandatory but recommended
agentIdentifier agentIdentifierType preservation system mandatory unit and component
agentIdentifier agentIdentifierValue Archivematica-0.6 mandatory unit and component
agentName none Archivematica
agentType none software


Event metadata

Receive SIP (SIP gets placed in 1-receiveSIP)

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none ingestion
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository ID
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Check checksums

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none fixity check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Generate checksums

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none message digest calculation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote e479688508922354bdab09bca60d8d0e
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Review SIP

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none SIP review
eventDateTime none [date - may not be automatically generated]
eventDetail none [free text field - could include information about the Submission Information Agreement against which the SIP was checked, etc.]
eventOutcomeInformation eventOutcome {pass; conditional pass}
eventOutcomeDetail eventOutcomeDetailNote
  • some files missing
  • appraisal required
linkingAgentIdentifier linkingAgentIdentifierType repository ID
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Place SIP in quarantine

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none start quarantine
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository ID
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Remove SIP from quarantine

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none end quarantine
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Unpack zipped files

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none unpack
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="easyextract"; version="0.1.0"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote unpacked Newsletter.zip
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Assign UUIDs

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none create UUID
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="UUID"; version="1.6.2"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote 270bd067-0483-4c5f-bdec-f2cbd6e651aa
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Remove prohibited characters

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none filename cleanup
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="detox"; version="1.2.0-1"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote Original name="cover image.bmp"; cleaned up name="cover_image.bmp"
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Scan for viruses

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none virus check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="Clam AV"; version="0.95.2"
eventOutcomeInformation eventOutcome pass
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Identify format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none format identification
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="File Information Toolset"; version="0.2.6"
eventOutcomeInformation eventOutcome {positive; tentative; unidentified}
eventOutcomeDetail eventOutcomeDetailNote fmt/116
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Validate format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none validation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="File Information Toolset"; version="0.2.6"
eventOutcomeInformation eventOutcome {pass; partial pass; fail}
eventOutcomeDetail eventOutcomeDetailNote format="Windows Bitmap"; version="3.0"; result="Well-formed and valid"
  • It is important to include the format and version against which Jhove is validating the file; otherwise it can misidentify or fail to identify the format and give a false positive for validation (for example, when it identifies the format as "bytestream" and then declares the file "Well-formed and valid").
  • This semantic unit can be repeated if there is a specific error message relating to failed validation.
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Normalize to preservation format

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none normalization
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program=ImageMagick; version=6.6.4.0; command=%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%
eventOutcomeInformation eventOutcome {Normalized; Not normalized}
eventOutcomeDetail eventOutcomeDetailNote
  • Normalization failed
  • Already in preservation format. No need to normalize
linkingAgentIdentifier linkingAgentIdentifierType repository ID
linkingAgentIdentifier linkingAgentIdentifierValue CVA


Generate access copy

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none access copy generation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% %accessFileDirectory%%fileTitle%.%accessFormat%"
eventOutcomeInformation eventOutcome {Access copy generated; access copy not generated}
eventOutcomeDetail eventOutcomeDetailNote
  • Normalization failed
  • Already in access format. No need to normalize
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Create new file (create normalized file)

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType Archivematica ID
eventIdentifier eventIdentifierValue [alphanumeric code]
eventType none creation
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="ImageMagick"; version="6.6.4.0"; command="%convertPath% %fileFullName% +compress %preservationFileDirectory%%fileTitle%.%preservationFormat%"
eventOutcomeInformation eventOutcome
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType repository system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-0.6


Mandatory PREMIS elements (mandatory semantic units + mandatory components)

Entity Semantic unit Semantic component Present in Archivematica?
Object 1.1 objectIdentifier 1.1.1 objectIdentifierType No
Object 1.1 objectIdentifier 1.1.2 objectIdentifierValue Yes
Object 1.2 objectCategory none No
Object 1.5 objectCharacteristics 1.5.1 Composition level No
Object 1.5.4 objectCharacteristics/format Either 1.5.4.1 formatDesignation or 1.5.4.2 formatRegistry must be used
  • 1.5.4.1.1 formatName Yes
  • 1.5.4.2.1 formatRegistryName No
  • 1.5.4.2.2 formatRegistryKey Yes
Object 1.7 Storage Either 1.7.1 contentLocation or 1.7.2 storageMedium must be used. However, "if the preservation repository uses the objectIdentifier as a handle for retrieving data, contentLocation is implicit and does not need to be recorded." No, but retrieval may be managed through UUIDs.
Event 2.1 eventIdentifer 2.1.1 eventIdentifierType No
Event 2.1 eventIdentifer 2.1.2 eventIdentifierValue No
Event 2.2 eventType none Partial
Event 2.3 eventDateTime none Partial
Agent 3.1 agentIdentifier 3.1.1 agentIdentifierType No
Agent 3.1 agentIdentifier 3.1.2 agentIdentifierValue No