Difference between revisions of "METS"

From Archivematica
Jump to navigation Jump to search
Line 27: Line 27:
 
**license
 
**license
 
**text/ocr
 
**text/ocr
 +
 +
==<structMap>==
 +
*There is one structMap showing the physical layout of the files in the objects directory.
  
 
=Detailed outline=
 
=Detailed outline=
Line 32: Line 35:
 
[[File:METS_outline2.png|680px|thumb|center|]]
 
[[File:METS_outline2.png|680px|thumb|center|]]
  
==<structMap>==
+
=dmdSec=
*There is one structMap showing the physical layout of the files in the objects directory.
+
This section shows a sample METS structure for an ingested SIP containing the following:
 +
 
 +
[[File:dmdSec.png|680px|thumb|center|]]
  
  
 +
*/objects
 +
**LAND2.BMP
 +
**lion.svg
 +
**/More images
 +
***MARBLES.TGA
 +
This section shows a sample METS structure for an ingested SIP containing the following:
  
=Generic transfer=
+
*/objects
 +
**LAND2.BMP
 +
**lion.svg
 +
**/More images
 +
***MARBLES.TGA
 
This section shows a sample METS structure for an ingested SIP containing the following:
 
This section shows a sample METS structure for an ingested SIP containing the following:
  
Line 47: Line 62:
  
 
==<dmdSec>==
 
==<dmdSec>==
The dmdSec for a generic transfer is simple Dublin Core, except for <isPartOf>, which is a qualification of <Relation>. Note that ID for this dmdSec is 01. Any successive dmdSecs (eg for file-level descriptions) are numbered consecutively; the descriptions are linked to their relevant files and directories in the structMap.
 
  
 
[[File:dmdSec1.png|680px|thumb|center|]]
 
[[File:dmdSec1.png|680px|thumb|center|]]

Revision as of 12:14, 13 October 2011

Main Page > Development > Development documentation > Metadata elements > METS

Basic outline

  • The METS file will have a basic generic structure which will be present for all AIPs derived from different kinds of transfers.
METS outline.png

<dmdSec>

  • There may be one dmdSec for the AIP as a whole. Each original file may also have a dmdSec.
  • The dmdSecs are numbered dmdSec_01, dmdSec_02 etc.
  • The dmdSec contains Dublin Core metadata. If the user does not enter any DC metadata during transfer/ingest and no DC metadata was included in the transfer (eg as part of a DSpace export), there will be no dmdSec.
  • The dmdSec may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.

<amdSec>

  • There is one amdSec for each object.
  • The amdSec is identified by the filename and UUID.
  • Each amdSec will include one digiprovMD.
  • An amdSec for an original object may also contain one or more rightsMDs. The rightsMD may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.

<fileSec>

  • There is one fileSec listing all files.
  • The fileSec is organized into the following fileGrps, only the first of which is required for all METS files:
    • original
    • preservation
    • service
    • access
    • license
    • text/ocr

<structMap>

  • There is one structMap showing the physical layout of the files in the objects directory.

Detailed outline

METS outline2.png

dmdSec

This section shows a sample METS structure for an ingested SIP containing the following:

DmdSec.png


  • /objects
    • LAND2.BMP
    • lion.svg
    • /More images
      • MARBLES.TGA

This section shows a sample METS structure for an ingested SIP containing the following:

  • /objects
    • LAND2.BMP
    • lion.svg
    • /More images
      • MARBLES.TGA

This section shows a sample METS structure for an ingested SIP containing the following:

  • /objects
    • LAND2.BMP
    • lion.svg
    • /More images
      • MARBLES.TGA

<dmdSec>

DmdSec1.png

<fileSec>

The fileSec is broken into two fileGrps, one for original files and one for preservation copies:

  • <fileGrp USE="original">
  • <fileGrp USE="preservation">

Example:

  • <fileGrp USE="original">
    • <file ID="LAND2.BMP-[UUID]" GROUPID="G1" ADMID="LAND2.BMP-[UUID]"><Flocat xlink:href="objects/LAND2.BMP" locType="other" otherLocType="system"/>
  • <fileGrp USE="preservation">
    • <file ID="LAND2-[UUID].tif-[UUID]" GROUPID="G1" ADMID="LAND2-[UUID].tif-[UUID]"><Flocat xlink:href="objects/LAND2-[UUID].tif" locType="other" otherLocType="system"/>

Note the GROUPID="G1"; this links the original file to its normalized version. Also note that the objects in the submissionDocumentation folder are treated in the same way as ingested objects.



Generic fileSec.png

<structMap>

The structMap section of the Archivematica METS file is designed to capture the directory structure of the AIP. Its TYPE is therefore physical (rather than logical) and it is grouped into divisions by directory, as follows:

  • AIP
    • /objects
      • /directory1
      • /directory2 etc.
      • /submissionDocumentation
Generic structMap.png


Note the DMID="01" in the objects directory div. This links the Dublin Core metadata to the contents of the objects directory.

DSpace transfer

A typical DSpace transfer will contain objects, licenses, METS files, and ocr text files if the objects are scanned pdf files. In this example, the pdf files are scanned articles, the files without extensions are licenses and the txt files are ocr text for the articles. Note that there is a collection-level METS file in the objects directory. On ingest, all the METS files are assigned UUIDs and moved to the metadata directory of the SIP.

  • /objects
    • /Item@249-2700
    • mets.xml
      • bitstream_8266.pdf
      • bitstream_8267
      • bitstream_40314.txt
      • mets.xml
    • /Item@249-2701
      • bitstream_8268.pdf
      • bitstream_8269
      • bitstream_39530.txt
      • mets.xml

dmdSec

The descriptive metadata for DSpace collections are contained in the METS files that come with the DSpace export, so the Archivematica METS file will point to them using mdRef. In this example, dmdSec_01 points to the collection-level mets file; dmdSec_02 and dmdSec_03 point to the mets files for the objects:

DSpace dmdSec.png

<fileSec>

The fileSec is broken into four fileGrps as follows:

  • <fileGrp USE="original">
  • <fileGrp USE="preservation">
  • <fileGrp USE="text/ocr">
  • <fileGrp USE="license">
DSpace fileSec.png


<structMap>

DSpace structMap.png

Digitization output

A SIP containing the output of a digitization project may contain service and access copies of objects in additional to master copies. Normalization will not typically be required on any of the objects. An example of this kind of output would be unedited tiff files with colour targets (master), edited tiff files with colour targets removed (service) and high-resolution jpegs (access copies). In some cases, the access copies will be removed to the DIP, but in others they will be copied to the DIP but copies will also be retained in the objects directory.

The SIP should be structured so that the objects are clearly identified as master, service or access copies, i.e. through the directory structure:

  • /objects
    • file1.tif
    • file2.tif
    • /service
      • file1.tif
      • file2.tif
    • /access
      • file1.jpg
      • file2.jpg

<fileSec>

We will keep <fileGrp USE="original"> for master copies to stay consistent with the generic METS file structure. So a typical fileSec for an AIP containing digitization output would consist of the following:

  • <fileGrp USE="original">
  • <fileGrp USE="service">
  • <fileGrp USE="access">

There also be <fileGrp USE="preservation"> for normalized copies of the submission documentation or if any of the ingested objects are normalized.

<structMap>

Digitized structMap.png