Difference between revisions of "METS"

From Archivematica
Jump to navigation Jump to search
 
(54 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > [[Metadata elements]] > METS
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > [[Metadata elements]] > METS
  
=Requirements=
+
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
*The METS file will have a basic generic structure which will be present for all AIPs derived from different kinds of transfers. Certain types of transfers, such as those from DSpace, will have additional requirements but will have the same basic structure.
 
  
=Generic transfer=
+
=METS for Archivematica AIP=
This section shows a sample METS structure for an ingested SIP containing the following:
 
  
*/objects
+
==Basic outline==
**LAND2.BMP
+
*The METS file will have a basic generic structure which will be present for all AIPs derived from different kinds of transfers. Archivematica currently uses METS version 1.11
**lion.svg
+
[[File:METS_outline.png|680px|thumb|center|]]
**/More images
+
*Sample METS from 1.0: [[Media:METS.752545fa-6869-41d4-95b1-710ac659525d.xml]]
***MARBLES.TGA
+
===<dmdSec>===
 +
*There may be one dmdSec for the AIP as a whole. Each original file may also have a dmdSec.
 +
*The dmdSecs are numbered dmdSec_1, dmdSec_2 etc.
 +
*The dmdSec contains Dublin Core metadata. If the user does not enter any DC metadata during transfer/ingest and no DC metadata was included in the transfer (eg as part of a DSpace export), there will be no dmdSec.
 +
*The dmdSec may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.  
  
==<dmdSec>==
+
===<amdSec>===
The dmdSec for a generic transfer is simple Dublin Core, except for <isPartOf>, which is a qualification of <Relation>.
+
*There is one amdSec for each object.
 +
*The amdSecs are numbered amdSec_1, amdSec_2 etc.
 +
*Each amdSec will include one techMD and multiple digiprovMDs
 +
*An amdSec for an original object may also contain one or more rightsMDs. The rightsMD may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.  
  
[[File:dmdSec1.png|680px|thumb|center|]]
+
===<fileSec>===
 +
*There is one fileSec listing all files.
 +
*The fileSec is organized into the following fileGrps:
 +
**original
 +
**preservation
 +
**service
 +
**access
 +
**submissionDocumentation
 +
**license
 +
**text/ocr
 +
*''Original'' is required for all METS files.
 +
*''SubmissionDocumentation'' is included if the AIP includes submission documentation.
 +
*''Preservation'' is included if the AIP includes normalized files.
 +
*''Service'' and ''access'' may be used if the AIP contains those subfolders - i.e as the output of digitization workflows.
 +
*''License'' and ''text/ocr'' are used if the AIP was created from a DSpace export containing licenses and ocr text files.
  
==<fileSec>==
+
===<structMap>===
The fileSec is broken into two fileGrps, one for original files and one for preservation copies:
+
*As of Archivematica 1.7 there are two structMaps.  The first is labeled "Archivematica Default" and shows the physical layout of the files in the objects directory.
*<fileGrp USE="original">
 
*<fileGrp USE="preservation">
 
  
Example:
+
The second is labeled "Normative Directory Structure" which shows the logical structure of the files in the objects directory. This second structMap is needed to document empty directories before they are deleted at 'store AIP' in the Storage Service. At AIP re-ingest the new logical structMap will be parsed to re-create the empty directories.
*<fileGrp USE="original">
 
**<file ID="LAND2.BMP-[UUID]" GROUPID="G1" ADMID="digiprov-LAND2.BMP-[UUID]"><Flocat xlink:href="objects/LAND2.BMP" locType="other" otherLocType="system"/>
 
*<fileGrp USE="preservation">
 
**<file ID="LAND2-[UUID].tif-[UUID]" GROUPID="G1" ADMID="LAND2-[UUID].tif-[UUID]"><Flocat xlink:href="objects/LAND2-[UUID].tif" locType="other" otherLocType="system"/>
 
  
Note the GROUPID="G1"; this links the original file to its normalized version. Also note that the objects in the submissionDocumentation folder are treated in the same way as ingested objects.
+
==Detailed outline==
  
 +
[[File:METS_outline_1a.png|680px|thumb|center|]]
  
</br>
+
[[File:METS_outline_1b.png|680px|thumb|center|]]
  
[[File:generic_fileSec.png|680px|thumb|center|]]
+
==dmdSec==
  
==<structMap>==
+
===mdWrap===
The structMap section of the Archivematica METS file is designed to capture the directory structure of the AIP. Its TYPE is therefore physical (rather than logical) and it is grouped into divisions by directory, as follows:
 
  
*AIP
+
The dmdSec consists of simple Dublin Core, except for <isPartOf>, which is a qualification of <Relation>.
**/objects
 
***/directory1
 
***/directory2 etc.
 
***/submissionDocumentation
 
  
[[File:generic_structMap.png|680px|thumb|center|]]
+
[[File:METS_dmdSec.png|680px|thumb|center|]]
  
</br>
+
===mdRef===
Note the DMID="AIP-description" in the objects directory div. This links the Dublin Core metadata to the contents of the objects directory.
+
The descriptive metadata for DSpace collections are contained in the METS files that come with the DSpace export, so the Archivematica METS file will point to them using mdRef.
  
 +
[[File:METS_mdRef.png|680px|thumb|center|]]
  
=DSpace transfer=
+
==amdSec==
A typical DSpace transfer will contain objects, licenses and ocr text files if the objects are scanned pdf files. In this example, the pdf files are scanned articles, the files without extensions are licenses and the txt files are ocr text for the articles:
+
The amdSec consists of either PREMIS metadata created by Archivematica or (in the case of rights metadata) references to external content. The screenshot below shows the amdSec for a file that has two rightsMD sections, one with rights captured in the PREMIS metadata (rightsMD_1) and one pointing to rightsMD sections in an external METS file (rightsMD_2).
*/objects
 
**/Item@249-2700
 
***bitstream_8262.pdf
 
***bitstream_8263
 
***bitstream_42698.txt
 
**/Item@249-2701
 
***bitstream_8264.pdf
 
***bitstream_8265
 
***bitstream_42699.txt
 
  
==<fileSec>==
+
[[File:METS_amdSec.png|680px|thumb|center|]]
The fileSec is broken into four fileGrps as follows:
 
*<fileGrp USE="original">
 
*<fileGrp USE="preservation">
 
*<fileGrp USE="text/ocr">
 
*<fileGrp USE="license">
 
  
[[File:DSpace_fileSec.png|680px|thumb|center|]]
+
=METS for Archivematica transfer=
 +
Archivematica creates a METS file for each transfer showing a structMap for the transfer.
  
 +
[[File:METS_transfer.png|680px|thumb|center|]]
  
==<structMap>==
+
=Changes for 0.9=
  
[[File:DSpace_structMap.png|680px|thumb|center|]]
+
In order to allow users to display digital files in a specified order in the access system, the following changes will be made to the structMap in Archivematica 0.9:
 +
*Files in the structMap will be ordered alphabetically by original name
 +
**Numbers will be respected
 +
**Case will be ignored
 +
**If the objects directory has subdirectories, the structMap will sort alphabetically by directory and within each directory
 +
*Each file will be placed in its own div
 +
*If desired, div labels can be applied to files via a csv file entitled ''file_labels.csv'' included in the metadata directory of the transfer
 +
**The csv file would consist of two columns: filename and label
 +
**The div labels would map to the title field in the access system
 +
**The div labels would be applied only to original versions of the files, not normalized versions
  
=Digitization output=
+
[[File:structMap-09.png|680px|thumb|center|]]
A SIP containing the output of a digitization project may contain service and access copies of objects in additional to master copies. Normalization will not typically be required on any of the objects. An example of this kind of output would be unedited tiff files with colour targets (master), edited tiff files with colour targets removed (service) and high-resolution jpegs (access copies). In some cases, the access copies will be removed to the DIP, but in others they will be copied to the DIP but copies will also be retained in the objects directory.
 
 
 
The SIP should be structured so that the objects are clearly identified as master, service or access copies, i.e. through the directory structure:
 
*/objects
 
**file1.tif
 
**file2.tif
 
**/service
 
***file1.tif
 
***file2.tif
 
**/access
 
***file1.jpg
 
***file2.jpg
 
 
 
==<fileSec>==
 
 
 
We will keep <fileGrp USE="original"> for master copies to stay consistent with the generic METS file structure. So a typical fileSec for an AIP containing digitization output would consist of the following:
 
*<fileGrp USE="original">
 
*<fileGrp USE="service">
 
*<fileGrp USE="access">
 
 
 
There also be <<fileGrp USE="preservation"> for normalized copies of the submission documentation or if any of the ingested objects are normalized.
 
 
 
==<structMap>==
 

Latest revision as of 15:34, 11 February 2020

Main Page > Development > Development documentation > Metadata elements > METS

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

METS for Archivematica AIP[edit]

Basic outline[edit]

  • The METS file will have a basic generic structure which will be present for all AIPs derived from different kinds of transfers. Archivematica currently uses METS version 1.11
METS outline.png

<dmdSec>[edit]

  • There may be one dmdSec for the AIP as a whole. Each original file may also have a dmdSec.
  • The dmdSecs are numbered dmdSec_1, dmdSec_2 etc.
  • The dmdSec contains Dublin Core metadata. If the user does not enter any DC metadata during transfer/ingest and no DC metadata was included in the transfer (eg as part of a DSpace export), there will be no dmdSec.
  • The dmdSec may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.

<amdSec>[edit]

  • There is one amdSec for each object.
  • The amdSecs are numbered amdSec_1, amdSec_2 etc.
  • Each amdSec will include one techMD and multiple digiprovMDs
  • An amdSec for an original object may also contain one or more rightsMDs. The rightsMD may contain a reference to metadata in another file, such as a mets.xml file included in a DSpace export.

<fileSec>[edit]

  • There is one fileSec listing all files.
  • The fileSec is organized into the following fileGrps:
    • original
    • preservation
    • service
    • access
    • submissionDocumentation
    • license
    • text/ocr
  • Original is required for all METS files.
  • SubmissionDocumentation is included if the AIP includes submission documentation.
  • Preservation is included if the AIP includes normalized files.
  • Service and access may be used if the AIP contains those subfolders - i.e as the output of digitization workflows.
  • License and text/ocr are used if the AIP was created from a DSpace export containing licenses and ocr text files.

<structMap>[edit]

  • As of Archivematica 1.7 there are two structMaps. The first is labeled "Archivematica Default" and shows the physical layout of the files in the objects directory.

The second is labeled "Normative Directory Structure" which shows the logical structure of the files in the objects directory. This second structMap is needed to document empty directories before they are deleted at 'store AIP' in the Storage Service. At AIP re-ingest the new logical structMap will be parsed to re-create the empty directories.

Detailed outline[edit]

METS outline 1a.png
METS outline 1b.png

dmdSec[edit]

mdWrap[edit]

The dmdSec consists of simple Dublin Core, except for <isPartOf>, which is a qualification of <Relation>.

METS dmdSec.png

mdRef[edit]

The descriptive metadata for DSpace collections are contained in the METS files that come with the DSpace export, so the Archivematica METS file will point to them using mdRef.

METS mdRef.png

amdSec[edit]

The amdSec consists of either PREMIS metadata created by Archivematica or (in the case of rights metadata) references to external content. The screenshot below shows the amdSec for a file that has two rightsMD sections, one with rights captured in the PREMIS metadata (rightsMD_1) and one pointing to rightsMD sections in an external METS file (rightsMD_2).

METS amdSec.png

METS for Archivematica transfer[edit]

Archivematica creates a METS file for each transfer showing a structMap for the transfer.

METS transfer.png

Changes for 0.9[edit]

In order to allow users to display digital files in a specified order in the access system, the following changes will be made to the structMap in Archivematica 0.9:

  • Files in the structMap will be ordered alphabetically by original name
    • Numbers will be respected
    • Case will be ignored
    • If the objects directory has subdirectories, the structMap will sort alphabetically by directory and within each directory
  • Each file will be placed in its own div
  • If desired, div labels can be applied to files via a csv file entitled file_labels.csv included in the metadata directory of the transfer
    • The csv file would consist of two columns: filename and label
    • The div labels would map to the title field in the access system
    • The div labels would be applied only to original versions of the files, not normalized versions
StructMap-09.png