Difference between revisions of "Dataset preservation"

From Archivematica
Jump to navigation Jump to search
Line 33: Line 33:
 
[[File:dataset_structure.png|680px|thumb|center|Archival storage area containing pointer files, AICs and AIPs]]
 
[[File:dataset_structure.png|680px|thumb|center|Archival storage area containing pointer files, AICs and AIPs]]
  
==Sample METS files==
+
*Below is a sample METS file for an AIC, showing:
 +
**A link to the aggregate metadata file for the dataset (dmdSec)
 +
**Links to the AIC's constituent AIPs (the structMap).
  
 
[[File:aicMETS.png|680px|thumb|center|Sample METS file for an Archival Information Collection (AIC)]]
 
[[File:aicMETS.png|680px|thumb|center|Sample METS file for an Archival Information Collection (AIC)]]

Revision as of 17:53, 4 March 2013

Workflow

  • Metadata ingest: Metadata will be created outside of Archivematica prior to ingest and added to the metadata folder of the transfer. See Metadata, below.
  • Metadata validation: Archivematica should include a micro-service to validate metadata on ingest, using something like xmllint. Sample validation command: xmllint --schema ddi:instance:3_1 metadata/CCRI-CDN-Census1911V20110628.xml.
  • Normalization:Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499.


Metadata

METS and DDI/FGDC

  • DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/.
  • FGDC is Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]; see http://www.fgdc.gov/metadata/csdgm/
  • DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]."
    • In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: <mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>.


METS and other metadata standards


Hierarchical AIC/AIP structure

  • Because datasets can be large and heterogeneous, one "dataset" may be broken into multiple AIPs. In such cases, the multiple AIPs can be intellectually combined into one AIC, or Archival Information Collection, defined by the OAIS reference model as "[a]n Archival Information Package whose Content Information is an aggregation of other Archival Information Packages." (OAIS 1-9).
    • The AIC will consist entirely of a METS file with aggregate-level descriptive metadata (eg metadata for the dataset or study as a whole) plus a logical structMap listing all child AIPs.
    • Each child AIP will include a logical structMap pointing to the parent AIC. The aggregate-level descriptive metadata will NOT be duplicated in the child AIP.
    • In storage, a pointer.xml file gives the uri and extraction (eg unzipping) information for an AIC or stand-alone AIP. Question: does the pointer.xml file give the uri and extraction info for AIPs that are children of AICs, or is that information captured in the AIC?


Archival storage area containing pointer files, AICs and AIPs
  • Below is a sample METS file for an AIC, showing:
    • A link to the aggregate metadata file for the dataset (dmdSec)
    • Links to the AIC's constituent AIPs (the structMap).
Sample METS file for an Archival Information Collection (AIC)