TRIM exports

From Archivematica
Revision as of 13:43, 23 March 2017 by Hbecker (talk | contribs) (Move to feature requirements category)
Jump to navigation Jump to search

Main Page > Development > Development documentation > TRIM exports

This page documents ingest of TRIM exports based on requirements for VanDocs ingest at City of Vancouver Archives.

TRIM export contents

A TRIM export consists of

  • 1 or more containers
  • A manifest of the transfer (manifest.txt)
  • XML schema documentation for all xml files in the transfer (container, location and document xml metadata)
  • Location metadata (Location.xml)
  • Container metadata (ContainerMetadata.xml)
  • Document metadata (eg DOC_2012_000100_Metadata.xml)
  • Documents (eg DOC_2012_000100.docx)


VanDocs1g.png


Processing a TRIM export

Parsing contents to the SIP

  • Each transfer is broken into one SIP per container
  • manifest.txt is copied to metadata/submissionDocumentation/
  • Location.xml is copied to metadata/
  • All schema documentation is copied to metadata/
  • The relevant ContainerMetadata.xml is copied to metadata/
  • The relevant document metadata files are copied to metadata/
  • All documents are copied to objects/


A SIP generated from a TRIM export


Verifying manifest

The contents of the transfer must be verified against the manifest.txt file during the "Verify transfer compliance" micro-service. Associated PREMIS event: manifest check. See below for details.

Manifest check

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 21h50321-6d7b-3855-89ag-a8b0fhc1f256
eventType none manifest check
eventDateTime none 2011-08-01T09:08:46-01:00
eventDetail none
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-1.0


Verifying checksums

Each document metadata file contains an md5 checksum for the document:


Checksumg.png


These checksums must be verified during the "Verify transfer checksums" micro-service. Associated PREMIS event: fixity check


Fixity check

Semantic unit Semantic component Sample value(s) Notes
eventIdentifier eventIdentifierType UUID
eventIdentifier eventIdentifierValue 73f87321-6d7b-3855-89ag-a8b0fhc1f256
eventType none fixity check
eventDateTime none 2010-08-01T09:08:46-01:00
eventDetail none program="MD5Deep"; version="3.6"
eventOutcomeInformation eventOutcome {pass; fail}
eventOutcomeDetail eventOutcomeDetailNote
linkingAgentIdentifier linkingAgentIdentifierType preservation system
linkingAgentIdentifier linkingAgentIdentifierValue Archivematica-1.0


The AIP METS file

dmdSecs

  • Each container will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (ContainerMetadata.xml)
  • Each file will have one dmdSec consisting of Dublin Core metadata derived from the TRIM export metadata (eg DOC_2012_000100_Metadata.xml)


DmdSecsg.png


Container metadata mapping

TRIM element DC element RAD/AtoM element Comments
<TitleFreeTextPart> <dcterms:title> Title proper
<Department> <dcterms:creator> Name AtoM adds a Name field linked to the Date(s) of creation field
<DateModified> <dcterms:date> Date(s) of creation Date range based on earliest and latest DateModified in document metadata
<OPR> <dcterms:provenance> Immediate source of acquisition
<RecordNumber> <dc:identifier> Identifier Only the numbers to the right of the slash in this field are used - eg 04-4000/0000070 --> 0000070
n/a <dcterms:extent> Physical description Count of documents in the SIP plus fixed text: "digital objects"
n/a n/a Level of description Level of description taken from METS structMap div TYPE
<FullClassificationNumber> <dcterms:isPartOf> n/a Field does not map to RAD but is used along with <OPR> to determine DIP upload destination


Sample container description

TRIM AtoM
<TitleFreeTextPart> PCI Compliance Title proper: PCI Compliance
<Department> IT Strategy, Business Relationships and Projects - IT Name: IT Strategy, Business Relationships and Projects - IT
<DateModified> 2010-03-01T18:20:15-08:00 / 2012-05-01T19:26:23-08:00 Date(s) of creation: 2010-03-01 - 2012-05-01
<OPR> IT Business Strategies Immediate source of acquisition: IT Business Strategies
<RecordNumber> 04-4000/0000070 Identifier: 0000070
n/a Physical description: 184 digital objects
n/a Level of description: File
<FullClassificationNumber>04-4000-20


Document metadata mapping

TRIM element DC element RAD/AtoM element Comments
<TitleFreeTextPart> <dc:title> Title proper
<DateModified> <dc:date> Date(s) of creation
<RecordNumber> <dc:identifier> Identifier
n/a n/a Level of description Level of description will be obtained from METS StructMap div TYPE



Sample document description

TRIM AtoM
<TitleFreeTextPart> MCPP Project Report Title proper: MCPP Project Report
<DateModified> 2010-03-01T18:20:15-08:00 Date(s) of creation: 2010-03-01
<RecordNumber> DOC/2010/000100 Identifier: DOC/2010/000100
n/a Level of description: Item



amdSecs

  • Each container will have an amdSec consisting of:
    • A digiprovMD with an xlink reference to metadata/ContainerMetadata.xml


Sample amdSec for a container


  • Each file will have an amdSec consisting of:
    • A rightsMD populated with PREMIS rights (see Flagging closed AIPs, below)
    • A digiprovMD with an xlink reference to the the relevant document metadata xml file
    • A techMD and digiprovMDs generated by Archivematica during processing


Sample amdSec for a file


fileSec and structMaps

  • Each METS file will have two structMaps, the Archivematica default structMap and a logical structMap for hierarchically arranging the container into a file and its child items
  • The container and file div TYPE elements in the logical structMap will map to the RAD Level of description field in AtoM
  • The structMap contains the links between containers and files and their relevant dmdSecs
  • The structMap also contains the link between the container and its amdSec
  • The files are linked to their amdSecs in the fileSec


StructMapg.png


Flagging closed AIPs

  • The container metadata file (ContainerMetadata.xml) has two fields whose values will be used to populate the PREMIS rights entity in the SIP (in the METS <rightsMD> element), DateClosed and RetentionSchedule. Examples are:
    • <DateClosed>2012-08-17T16:13:31-08:00</DateClosed>
    • <RetentionSchedule>EV2.3.A</RetentionSchedule>
  • The DateClosed field will be used to populate the termOfRestriction startDate in the PREMIS rights entity
  • The DateClosed and RetentionSchedule fields will be used to calculate the termOfRestriction endDate in the PREMIS rights entity. For the examples provided above, Archivematica would calculate 5 years from the end of 2012-08-17 and then to the end of the calendar year, for a result of 2017-12-31.
  • The closure period would also be captured as a standardized free text entry in the rightsGrantedNote field of the PREMIS rights entity, for example: Closed until 2012-12-31.
  • Other PREMIS fields would be auto-populated for every VanDocs ingest as shown in the screenshot below.


VanDocs rights.png

DIP upload

  • Upon DIP upload to AtoM, the container will become a file-level description, with level of description populated by the structMap div label for the container ("file"). Each object in the DIP will become a child level with the level of description populated by the structMap div label for the object ("item").
  • Descriptive metadata in RAD will be populated by the appropriate dmdSec for each container and object (see container and document metadata mapping, above).