Difference between revisions of "Dataset preservation"

From Archivematica
Jump to navigation Jump to search
Line 1: Line 1:
=Workflow=
+
==Workflow==
  
 
*Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499.
 
*Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499.
 
</br>
 
</br>
  
=Metadata=
+
==Metadata==
  
==METS and DDI/FGDC==
+
===METS and DDI/FGDC===
  
 
*DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/.
 
*DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/.
Line 12: Line 12:
 
*DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]."
 
*DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]."
 
**In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>''.
 
**In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>''.
 
+
</br>
==METS and other metadata standards==
+
===METS and other metadata standards===
  
 
*Other metadata standards that could be used for ingested datasets include:
 
*Other metadata standards that could be used for ingested datasets include:
Line 19: Line 19:
 
**SDMX for aggregate data: http://sdmx.org/?page_id=10
 
**SDMX for aggregate data: http://sdmx.org/?page_id=10
 
**EML, the Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/index.html
 
**EML, the Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/index.html
 
+
*If these standards are used, the mdRef in the METS file would need to use OTHER as MDTYPE, for example: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="OTHER" OTHERMDTYPE="SDMX" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>''
==Validating DDI and FGDC on ingest==
+
</br>
 +
===Validating DDI and FGDC on ingest===
  
 
*Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint.  
 
*Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint.  

Revision as of 12:03, 8 January 2013

Workflow


Metadata

METS and DDI/FGDC

  • DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/.
  • FGDC is Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]; see http://www.fgdc.gov/metadata/csdgm/
  • DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]."
    • In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: <mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>.


METS and other metadata standards


Validating DDI and FGDC on ingest

  • Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint.
    • Sample validation command: xmllint --schema ddi:instance:3_1 metadata/CCRI-CDN-Census1911V20110628.xml.