Difference between revisions of "DSpace exports"

From Archivematica
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > DSpace exports
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > DSpace exports
  
This page analyzes the structure of DSpace exports from an uncustomized (i.e. out of the box) DSpace installation.
+
This page analyzes the structure of a DSpace collection export from an uncustomized (i.e. out of the box) DSpace installation.
  
= Collection export =
 
 
Used the following command (from DSpace [http://www.dspace.org/1_7_1Documentation/AIP%20Backup%20and%20Restore.html#AIPBackupandRestore-ExportingAIPHierarchy user documentation]) to export a two-item collection with the handle 123456789-6:
 
Used the following command (from DSpace [http://www.dspace.org/1_7_1Documentation/AIP%20Backup%20and%20Restore.html#AIPBackupandRestore-ExportingAIPHierarchy user documentation]) to export a two-item collection with the handle 123456789-6:
  
Line 17: Line 16:
 
[[File:export.png|680px|thumb|center|]]
 
[[File:export.png|680px|thumb|center|]]
  
== Notes==
+
==Item-level METS files==
 +
 
 +
=== Handle ===
 
*The mets.xml file is linked to the object by the handle of the original zipped file:
 
*The mets.xml file is linked to the object by the handle of the original zipped file:
  
Line 24: Line 25:
 
</br>
 
</br>
  
*The text file bitstreams in the other directories are licenses. Note that they are not identified by filename as license files - Archivematica will need to recognize license files from each object's METS file (i.e. from <fileSec>). Here is an example of the <fileSec> showing the object to be preserved (bitstream_12.png) and its license file (bitstream_13):
+
=== Licenses ===
 +
 
 +
The text file bitstreams in the two item-level directories are licenses. Note that they are not identified by filename as license files - Archivematica will need to recognize license files from each object's METS file (i.e. from <fileSec>). Here is an example of the <fileSec> showing the object to be preserved (bitstream_12.png) and its license file (bitstream_13):
  
 
[[File:fileSec.png|680px|thumb|center|]]
 
[[File:fileSec.png|680px|thumb|center|]]
 +
 +
Archivematica should move the license file to the metadata/submissionDocumentation directory; the text can be parsed to the <rights> container in the PREMIS metadata. See [[PREMIS metadata: rights#License-based]]
  
 
</br>
 
</br>
 +
 +
=== RightsMD ===
 +
 +
Each object also has an amdSec containing rightsMD data (populated automatically according to DSpace configuration settings):
 +
 +
[[File:rights.png|680px|thumb|center|]]
 +
 +
Should Archivematica parse this rightsMD metadata to the PREMIS file?
 +
 +
=== Descriptive metadata ===
  
 
*Each object has two dmdSecs: MODS and [https://wiki.duraspace.org/display/DSPACE/DSpaceIntermediateMetadata DSpace Intermediate Metadata (DIM)].  
 
*Each object has two dmdSecs: MODS and [https://wiki.duraspace.org/display/DSPACE/DSpaceIntermediateMetadata DSpace Intermediate Metadata (DIM)].  
Line 102: Line 117:
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]
 +
 +
__NOTOC__

Revision as of 17:09, 19 September 2011

Main Page > Development > Development documentation > DSpace exports

This page analyzes the structure of a DSpace collection export from an uncustomized (i.e. out of the box) DSpace installation.

Used the following command (from DSpace user documentation) to export a two-item collection with the handle 123456789-6:

./dspace packager -d -a -t AIP -e <user name> -i 123456789-6 calamy.zip

This results in the export of three zipped packages: one for the collection and one for each of the items:

  • calamy.zip
  • ITEM@123456789-7.zip
  • ITEM@123456789-8.zip

The extracted contents of each zipped file are shown in this screenshot:

Export.png

Item-level METS files

Handle

  • The mets.xml file is linked to the object by the handle of the original zipped file:
MetsID.png


Licenses

The text file bitstreams in the two item-level directories are licenses. Note that they are not identified by filename as license files - Archivematica will need to recognize license files from each object's METS file (i.e. from <fileSec>). Here is an example of the <fileSec> showing the object to be preserved (bitstream_12.png) and its license file (bitstream_13):

FileSec.png

Archivematica should move the license file to the metadata/submissionDocumentation directory; the text can be parsed to the <rights> container in the PREMIS metadata. See PREMIS metadata: rights#License-based


RightsMD

Each object also has an amdSec containing rightsMD data (populated automatically according to DSpace configuration settings):

Rights.png

Should Archivematica parse this rightsMD metadata to the PREMIS file?

Descriptive metadata

  • Each object has two dmdSecs: MODS and DSpace Intermediate Metadata (DIM).
    • The DIM metadata is not intended for use outside of DSpace: according to the DSpace website, "[DIM] is used by XsltCrosswalk. It is called the Intermediate format because it is intended solely as an intermediate stage in XML-translation-based crosswalks. To reiterate, This is an INTERMEDIATE format, it is NOT for exporting or harvesting metadata!"
  • What should we do with the MODS metadata?
    • Leave it in the DSpace METS file and just link the object to its METS file?
    • Add an <mdRef> to the Archivematica METS file to link each object to its MODS metadata?
    • Add the MODS metadata to the Archivematica METS file as <mdWrap>?

Collection-level mets.xml file

The mets.xml file for the collection is structured as follows:

  • <mets ID="DSpace_COLLECTION_123456789-6" OBJID="hdl:123456789/6" TYPE="DSpace COLLECTION" PROFILE="http://www.dspace.org/schema/aip/mets_aip_1_0.xsd" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd">
  • <metsHdr>
  • <dmdSec> (contains MODS metadata for collection-level description)
  • <dmdSec> (contains DSpace Intermediate Metadata (DIM) for collection-level description; all mapped to dc; some overlap with MODS metadata)
  • <amdSec> (contains information on DSpace users and groups associated with the collection)
  • <fileSec> (references the collection's logo, if there is one)
  • <structMap> (links the collection to its logo, if there is one, plus its two child items)
  • <structMap> (links the collection to the DSpace Community)

Item-level mets.xml file

  • <metsHdr>
  • <dmdSec_1> (contains MODS metadata for item)
  • <dmdSec_2> (contains DIM metadata for item; all mapped to dc; some overlap with MODS metadata)
  • <amdSec> (contains rights metadata)
  • <amdSec> (contains rights metadata)
  • <amdSec> (contains PREMIS object metadata; rights metadata; DIM metadata for the item)
  • <amdSec> (contains rights metadata)
  • <amdSec> (contains PREMIS object metadata; rights metadata; DIM metadata for the licence)
  • <fileSec> (lists the item and its license)
  • <structMap> (links the bitstream to the logical object)
  • <structMap> (links the item to the collection)

Parsing a DSpace collection export in Archivematica

Requirements:

  • Map the elements of the DSpace AIPs to the Archivematica AIP
  • Structure the Archivematica mets.xml file to point to the DSpace mets.xml files
  • Index the metadata in all the xml files

Map the elements of the DSpace AIPs to the Archivematica AIP

  • The digital objects get placed in the objects directory
  • The license files get placed in the metadata/submissiondocumentation directory; the text is parsed to the <rights> container in the PREMIS metadata. See PREMIS metadata: rights#License-based
  • The mets.xml files get placed in the metadata/submissionDocumentation directory...hmm, why not put them in the metadata directory?

Link the object to its METS file

Each object in a DSpace AIP comes with its own METS file containing descriptive and rights metadata as well as some PREMIS object metadata.

Archivematica mets.xml file:

METS file section Description/notes
<dmdSec> DC metadata added during transfer/ingest; SIP-level only
<amdSec> PREMIS metadata
<fileSec> Lists all the files in the objects directory of the AIP
<structMap> Groups the contents in the objects directory of the AIP to reflect the folder structure of the AIP


Question: how do we link the object to the DSpace METS file? Give the METS file a UUID and make the link in the PREMIS relationships container?