Difference between revisions of "TRIM exports"

From Archivematica
Jump to navigation Jump to search
 
(101 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > DSpace exports
+
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > TRIM exports
 +
 
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information.</div><p>
  
 
This page documents ingest of TRIM exports based on requirements for VanDocs ingest at City of Vancouver Archives.
 
This page documents ingest of TRIM exports based on requirements for VanDocs ingest at City of Vancouver Archives.
  
</br>
+
[[Category:Feature requirements]]
  
 
==TRIM export contents==
 
==TRIM export contents==
Line 19: Line 21:
  
 
[[File:VanDocs1g.png|680px|thumb|center|]]
 
[[File:VanDocs1g.png|680px|thumb|center|]]
 +
 +
</br>
  
 
==Processing a TRIM export==
 
==Processing a TRIM export==
  
 +
===Parsing contents to the SIP===
 +
 +
*Each transfer is broken into one SIP per container
 +
*manifest.txt is copied to metadata/submissionDocumentation/
 +
*Location.xml is copied to metadata/
 +
*All schema documentation is copied to metadata/
 +
*The relevant ContainerMetadata.xml is copied to metadata/
 +
*The relevant document metadata files are copied to metadata/
 +
*All documents are copied to objects/
 +
 +
</br>
 +
 +
[[File:VanDocs2g.png|680px|thumb|center|A SIP generated from a TRIM export]]
 +
 +
</br>
 +
 +
===Verifying manifest===
 +
 +
The contents of the transfer must be verified against the manifest.txt file during the "Verify transfer compliance" micro-service.
 +
Associated PREMIS event: manifest check. See below for details.
 +
 +
==Manifest check==
 +
{| border="1" cellpadding="10" cellspacing="0" width=90%
 +
|-
 +
|- style="background-color:#cccccc;"
 +
!style="width:20%"|'''Semantic unit'''
 +
!style="width:20%"|'''Semantic component'''
 +
!style="width:20%"|'''Sample value(s)'''
 +
!style="width:20%"|'''Notes'''
 +
|-
 +
|eventIdentifier
 +
|eventIdentifierType
 +
|UUID
 +
|
 +
|-
 +
|eventIdentifier
 +
|eventIdentifierValue
 +
|21h50321-6d7b-3855-89ag-a8b0fhc1f256
 +
|
 +
|-
 +
|eventType
 +
|none
 +
|manifest check
 +
|
 +
|-
 +
|eventDateTime
 +
|none
 +
|2011-08-01T09:08:46-01:00
 +
|
 +
|-
 +
|eventDetail
 +
|none
 +
|
 +
|
 +
|-
 +
|eventOutcomeInformation
 +
|eventOutcome
 +
|{pass; fail}
 +
|
 +
|-
 +
|eventOutcomeDetail
 +
|eventOutcomeDetailNote
 +
|
 +
|
 +
|-
 +
|linkingAgentIdentifier
 +
|linkingAgentIdentifierType
 +
|preservation system
 +
|
 +
|-
 +
|linkingAgentIdentifier
 +
|linkingAgentIdentifierValue
 +
|Archivematica-1.0
 +
|
 +
|-
 +
|}
 +
 +
<br>
 +
 +
===Verifying checksums===
 +
 +
Each document metadata file contains an md5 checksum for the document:
 +
 +
</br>
 +
 +
[[File:checksumg.png|680px|thumb|center|]]
 +
 +
</br>
 +
 +
These checksums must be verified during the "Verify transfer checksums" micro-service.
 +
Associated PREMIS event: fixity check
 +
 +
</br>
 +
 +
==Fixity check==
 +
 +
{| border="1" cellpadding="10" cellspacing="0" width=90%
 +
|-
 +
|- style="background-color:#cccccc;"
 +
!style="width:20%"|'''Semantic unit'''
 +
!style="width:20%"|'''Semantic component'''
 +
!style="width:20%"|'''Sample value(s)'''
 +
!style="width:20%"|'''Notes'''
 +
|-
 +
|eventIdentifier
 +
|eventIdentifierType
 +
|UUID
 +
|
 +
|-
 +
|eventIdentifier
 +
|eventIdentifierValue
 +
|73f87321-6d7b-3855-89ag-a8b0fhc1f256
 +
|
 +
|-
 +
|eventType
 +
|none
 +
|fixity check
 +
|
 +
|-
 +
|eventDateTime
 +
|none
 +
|2010-08-01T09:08:46-01:00
 +
|
 +
|-
 +
|eventDetail
 +
|none
 +
|program="MD5Deep"; version="3.6"
 +
|
 +
|-
 +
|eventOutcomeInformation
 +
|eventOutcome
 +
|{pass; fail}
 +
|
 +
|-
 +
|eventOutcomeDetail
 +
|eventOutcomeDetailNote
 +
|
 +
|
 +
|-
 +
|linkingAgentIdentifier
 +
|linkingAgentIdentifierType
 +
|preservation system
 +
|
 +
|-
 +
|linkingAgentIdentifier
 +
|linkingAgentIdentifierValue
 +
|Archivematica-1.0
 +
|
 +
|-
 +
|}
 +
 +
<br>
 +
 +
==The AIP METS file==
 +
 +
===dmdSecs===
 +
 +
*Each container will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (''ContainerMetadata.xml'')
 +
*Each file will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (eg ''DOC_2012_000100_Metadata.xml'')
 +
 +
</br>
 +
 +
[[File:dmdSecsg.png|680px|thumb|center|]]
 +
 +
</br>
 +
 +
====Container metadata mapping====
 +
 +
{| border="1" cellpadding="10" cellspacing="0" width="100%"
 +
|-
 +
!'''TRIM element'''
 +
!'''DC element'''
 +
!'''RAD/AtoM element'''
 +
!'''Comments'''
 +
|-
 +
|<TitleFreeTextPart>
 +
|<dcterms:title>
 +
|'''Title proper'''
 +
|
 +
|-
 +
|<Department>
 +
|<dcterms:creator>
 +
|'''Name'''
 +
|AtoM adds a Name field linked to the Date(s) of creation field
 +
|-
 +
|<DateModified>
 +
|<dcterms:date>
 +
|'''Date(s) of creation'''
 +
|Date range based on earliest and latest DateModified in document metadata
 +
|-
 +
|<OPR>
 +
|<dcterms:provenance>
 +
|'''Immediate source of acquisition'''
 +
|
 +
|-
 +
|<RecordNumber>
 +
|<dc:identifier>
 +
|'''Identifier'''
 +
|Only the numbers to the right of the slash in this field are used - eg 04-4000/0000070 --> 0000070
 +
|-
 +
|n/a
 +
|<dcterms:extent>
 +
|'''Physical description'''
 +
|Count of documents in the SIP plus fixed text: "digital objects"
 +
|-
 +
|n/a
 +
|n/a
 +
|'''Level of description'''
 +
|Level of description taken from METS structMap div TYPE
 +
|-
 +
|<FullClassificationNumber>
 +
|<dcterms:isPartOf>
 +
|n/a
 +
|Field does not map to RAD but is used along with <OPR> to determine DIP upload destination
 +
|-
 +
|}
 +
 +
</br>
 +
 +
'''Sample container description'''
 +
 +
{| border="1" cellpadding="10" cellspacing="0" width="100%"
 +
|-
 +
!'''TRIM'''
 +
!'''AtoM'''
 +
|-
 +
|'''<TitleFreeTextPart>''' PCI Compliance
 +
|'''Title proper''': PCI Compliance
 +
|-
 +
|'''<Department>''' IT Strategy, Business Relationships and Projects - IT
 +
|'''Name''': IT Strategy, Business Relationships and Projects - IT
 +
|-
 +
|'''<DateModified>''' 2010-03-01T18:20:15-08:00 / 2012-05-01T19:26:23-08:00
 +
|'''Date(s) of creation''': 2010-03-01 - 2012-05-01
 +
|-
 +
|'''<OPR>''' IT Business Strategies
 +
|'''Immediate source of acquisition''': IT Business Strategies
 +
|-
 +
|'''<RecordNumber>''' 04-4000/0000070
 +
|'''Identifier''': 0000070
 +
|-
 +
|n/a
 +
|'''Physical description''': 184 digital objects
 +
|-
 +
|-
 +
|n/a
 +
|'''Level of description''': File
 +
|-
 +
|'''<FullClassificationNumber>'''04-4000-20
 +
|
 +
|-
 +
|}
 +
 +
</br>
 +
 +
====Document metadata mapping====
 +
 +
{| border="1" cellpadding="10" cellspacing="0" width="100%"
 +
|-
 +
!'''TRIM element'''
 +
!'''DC element'''
 +
!'''RAD/AtoM element'''
 +
!'''Comments'''
 +
|-
 +
|<TitleFreeTextPart>
 +
|<dc:title>
 +
|'''Title proper'''
 +
|
 +
|-
 +
|<DateModified>
 +
|<dc:date>
 +
|'''Date(s) of creation'''
 +
|
 +
|-
 +
|<RecordNumber>
 +
|<dc:identifier>
 +
|'''Identifier'''
 +
|
 +
|-
 +
|n/a
 +
|n/a
 +
|'''Level of description'''
 +
|Level of description will be obtained from METS StructMap div TYPE
 +
|-
 +
|}
 +
 +
 +
</br>
 +
 +
'''Sample document description'''
 +
 +
{| border="1" cellpadding="10" cellspacing="0" width="100%"
 +
|-
 +
!'''TRIM'''
 +
!'''AtoM'''
 +
|-
 +
|'''<TitleFreeTextPart>''' MCPP Project Report
 +
|'''Title proper''': MCPP Project Report
 +
|-
 +
|'''<DateModified>''' 2010-03-01T18:20:15-08:00
 +
|'''Date(s) of creation''': 2010-03-01
 +
|-
 +
|'''<RecordNumber>''' DOC/2010/000100
 +
|'''Identifier''': DOC/2010/000100
 +
|-
 +
|-
 +
|n/a
 +
|'''Level of description''': Item
 +
|}
 +
 +
 +
</br>
 +
 +
===amdSecs===
 +
*Each container will have an amdSec consisting of:
 +
**A digiprovMD with an xlink reference to metadata/ContainerMetadata.xml
 +
 +
</br>
 +
 +
[[File:TRIMamdSec1g.png|680px|thumb|center|Sample amdSec for a container]]
 +
 +
</br>
 +
 +
*Each file will have an amdSec consisting of:
 +
**A rightsMD populated with PREMIS rights (see '''Flagging closed AIPs''', below)
 +
**A digiprovMD with an xlink reference to the the relevant document metadata xml file
 +
**A techMD and digiprovMDs generated by Archivematica during processing
 +
 +
</br>
 +
 +
[[File:TRIMamdSec2g.png|680px|thumb|center|Sample amdSec for a file]]
 +
 +
</br>
 +
 +
===fileSec and structMaps===
 +
*Each METS file will have two structMaps, the Archivematica default structMap and a logical structMap for hierarchically arranging the container into a file and its child items
 +
*The container and file div TYPE elements in the logical structMap will map to the RAD Level of description field in AtoM
 +
*The structMap contains the links between containers and files and their relevant dmdSecs
 +
*The structMap also contains the link between the container and its amdSec
 +
*The files are linked to their amdSecs in the fileSec
 +
 +
</br>
 +
 +
[[File:structMapg.png|680px|thumb|center|]]
 +
 +
</br>
 +
 +
==Flagging closed AIPs==
 +
 +
*The container metadata file (ContainerMetadata.xml) has two fields whose values will be used to populate the PREMIS rights entity in the SIP (in the METS <rightsMD> element), DateClosed and RetentionSchedule. Examples are:
 +
**<DateClosed>2012-08-17T16:13:31-08:00</DateClosed>
 +
**<RetentionSchedule>EV2.3.A</RetentionSchedule>
 +
*The DateClosed field will be used to populate the termOfRestriction startDate in the PREMIS rights entity
 +
*The DateClosed and RetentionSchedule fields will be used to calculate the termOfRestriction endDate in the PREMIS rights entity. For the examples provided above, Archivematica would calculate 5 years from the end of 2012-08-17 and then to the end of the calendar year, for a result of 2017-12-31.
 +
*The closure period would also be captured as a standardized free text entry in the rightsGrantedNote field of the PREMIS rights entity, for example: Closed until 2012-12-31.
 +
*Other PREMIS fields would be auto-populated for every VanDocs ingest as shown in the screenshot below.
 +
 +
</br>
 +
 +
[[File:VanDocs_rights.png|680px|thumb|center|]]
 +
 +
==DIP upload==
  
*Each container becomes a single transfer
+
*Upon DIP upload to AtoM, the container will become a file-level description, with level of description populated by the structMap div label for the container ("file"). Each object in the DIP will become a child level with the level of description populated by the structMap div label for the object ("item").
*The md5 checksums present in each document metadata file are verified
+
*Descriptive metadata in RAD will be populated by the appropriate dmdSec for each container and object (see container and document metadata mapping, above).
*manifest.txt is copied to /metadata/submissionDocumentation
 
*Location.xml is copied to /metadata
 
*All schema documentation is copied to /metadata
 
*The relevant ContainerMetadata.xml is copied to /metadata
 
*The relevant document metadata files are copied to /metadata
 
*Each transfer becomes a single SIP
 

Latest revision as of 16:28, 11 February 2020

Main Page > Development > Development documentation > TRIM exports

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

This page documents ingest of TRIM exports based on requirements for VanDocs ingest at City of Vancouver Archives.

TRIM export contents[edit]

A TRIM export consists of

  • 1 or more containers
  • A manifest of the transfer (manifest.txt)
  • XML schema documentation for all xml files in the transfer (container, location and document xml metadata)
  • Location metadata (Location.xml)
  • Container metadata (ContainerMetadata.xml)
  • Document metadata (eg DOC_2012_000100_Metadata.xml)
  • Documents (eg DOC_2012_000100.docx)


VanDocs1g.png


Processing a TRIM export[edit]

Parsing contents to the SIP[edit]

  • Each transfer is broken into one SIP per container
  • manifest.txt is copied to metadata/submissionDocumentation/
  • Location.xml is copied to metadata/
  • All schema documentation is copied to metadata/
  • The relevant ContainerMetadata.xml is copied to metadata/
  • The relevant document metadata files are copied to metadata/
  • All documents are copied to objects/


A SIP generated from a TRIM export


Verifying manifest[edit]

The contents of the transfer must be verified against the manifest.txt file during the "Verify transfer compliance" micro-service. Associated PREMIS event: manifest check. See below for details.

Manifest check[edit]

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 21h50321-6d7b-3855-89ag-a8b0fhc1f256
eventType none manifest check
eventDateTime none 2011-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-1.0


Verifying checksums[edit]

Each document metadata file contains an md5 checksum for the document:


Checksumg.png


These checksums must be verified during the "Verify transfer checksums" micro-service. Associated PREMIS event: fixity check


Fixity check[edit]

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 73f87321-6d7b-3855-89ag-a8b0fhc1f256
eventType none fixity check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-1.0


The AIP METS file[edit]

dmdSecs[edit]

  • Each container will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (ContainerMetadata.xml)
  • Each file will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (eg DOC_2012_000100_Metadata.xml)


DmdSecsg.png


Container metadata mapping[edit]

TRIM element DC element RAD/AtoM element Comments
<TitleFreeTextPart> <dcterms:title> Title proper
<Department> <dcterms:creator> Name AtoM adds a Name field linked to the Date(s) of creation field
<DateModified> <dcterms:date> Date(s) of creation Date range based on earliest and latest DateModified in document metadata
<OPR> <dcterms:provenance> Immediate source of acquisition
<RecordNumber> <dc:identifier> Identifier Only the numbers to the right of the slash in this field are used - eg 04-4000/0000070 --> 0000070
n/a <dcterms:extent> Physical description Count of documents in the SIP plus fixed text: "digital objects"
n/a n/a Level of description Level of description taken from METS structMap div TYPE
<FullClassificationNumber> <dcterms:isPartOf> n/a Field does not map to RAD but is used along with <OPR> to determine DIP upload destination


Sample container description

TRIM AtoM
<TitleFreeTextPart> PCI Compliance Title proper: PCI Compliance
<Department> IT Strategy, Business Relationships and Projects - IT Name: IT Strategy, Business Relationships and Projects - IT
<DateModified> 2010-03-01T18:20:15-08:00 / 2012-05-01T19:26:23-08:00 Date(s) of creation: 2010-03-01 - 2012-05-01
<OPR> IT Business Strategies Immediate source of acquisition: IT Business Strategies
<RecordNumber> 04-4000/0000070 Identifier: 0000070
n/a Physical description: 184 digital objects
n/a Level of description: File
<FullClassificationNumber>04-4000-20


Document metadata mapping[edit]

TRIM element DC element RAD/AtoM element Comments
<TitleFreeTextPart> <dc:title> Title proper
<DateModified> <dc:date> Date(s) of creation
<RecordNumber> <dc:identifier> Identifier
n/a n/a Level of description Level of description will be obtained from METS StructMap div TYPE



Sample document description

TRIM AtoM
<TitleFreeTextPart> MCPP Project Report Title proper: MCPP Project Report
<DateModified> 2010-03-01T18:20:15-08:00 Date(s) of creation: 2010-03-01
<RecordNumber> DOC/2010/000100 Identifier: DOC/2010/000100
n/a Level of description: Item



amdSecs[edit]

  • Each container will have an amdSec consisting of:
    • A digiprovMD with an xlink reference to metadata/ContainerMetadata.xml


Sample amdSec for a container


  • Each file will have an amdSec consisting of:
    • A rightsMD populated with PREMIS rights (see Flagging closed AIPs, below)
    • A digiprovMD with an xlink reference to the the relevant document metadata xml file
    • A techMD and digiprovMDs generated by Archivematica during processing


Sample amdSec for a file


fileSec and structMaps[edit]

  • Each METS file will have two structMaps, the Archivematica default structMap and a logical structMap for hierarchically arranging the container into a file and its child items
  • The container and file div TYPE elements in the logical structMap will map to the RAD Level of description field in AtoM
  • The structMap contains the links between containers and files and their relevant dmdSecs
  • The structMap also contains the link between the container and its amdSec
  • The files are linked to their amdSecs in the fileSec


StructMapg.png


Flagging closed AIPs[edit]

  • The container metadata file (ContainerMetadata.xml) has two fields whose values will be used to populate the PREMIS rights entity in the SIP (in the METS <rightsMD> element), DateClosed and RetentionSchedule. Examples are:
    • <DateClosed>2012-08-17T16:13:31-08:00</DateClosed>
    • <RetentionSchedule>EV2.3.A</RetentionSchedule>
  • The DateClosed field will be used to populate the termOfRestriction startDate in the PREMIS rights entity
  • The DateClosed and RetentionSchedule fields will be used to calculate the termOfRestriction endDate in the PREMIS rights entity. For the examples provided above, Archivematica would calculate 5 years from the end of 2012-08-17 and then to the end of the calendar year, for a result of 2017-12-31.
  • The closure period would also be captured as a standardized free text entry in the rightsGrantedNote field of the PREMIS rights entity, for example: Closed until 2012-12-31.
  • Other PREMIS fields would be auto-populated for every VanDocs ingest as shown in the screenshot below.


VanDocs rights.png

DIP upload[edit]

  • Upon DIP upload to AtoM, the container will become a file-level description, with level of description populated by the structMap div label for the container ("file"). Each object in the DIP will become a child level with the level of description populated by the structMap div label for the object ("item").
  • Descriptive metadata in RAD will be populated by the appropriate dmdSec for each container and object (see container and document metadata mapping, above).