Difference between revisions of "Dataset preservation"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | =Workflow= | + | ==Workflow== |
*Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499. | *Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499. | ||
</br> | </br> | ||
− | =Metadata= | + | ==Metadata== |
− | ==METS and DDI/FGDC== | + | ===METS and DDI/FGDC=== |
*DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/. | *DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/. | ||
Line 12: | Line 12: | ||
*DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]." | *DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]." | ||
**In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>''. | **In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>''. | ||
− | + | </br> | |
− | ==METS and other metadata standards== | + | ===METS and other metadata standards=== |
*Other metadata standards that could be used for ingested datasets include: | *Other metadata standards that could be used for ingested datasets include: | ||
Line 19: | Line 19: | ||
**SDMX for aggregate data: http://sdmx.org/?page_id=10 | **SDMX for aggregate data: http://sdmx.org/?page_id=10 | ||
**EML, the Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/index.html | **EML, the Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/index.html | ||
− | + | *If these standards are used, the mdRef in the METS file would need to use OTHER as MDTYPE, for example: ''<mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="OTHER" OTHERMDTYPE="SDMX" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>'' | |
− | ==Validating DDI and FGDC on ingest== | + | </br> |
+ | ===Validating DDI and FGDC on ingest=== | ||
*Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint. | *Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint. |
Revision as of 12:03, 8 January 2013
Workflow
- Some datasets may require manual normalization: see https://projects.artefactual.com/issues/1499.
Metadata
METS and DDI/FGDC
- DDI is Data Documentation Initiative, a metadata specification for the social and behavioral sciences; see http://www.ddialliance.org/.
- FGDC is Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]; see http://www.fgdc.gov/metadata/csdgm/
- DDI and FGDC are considered descriptive metadata (mdSec) in METS. From http://www.loc.gov/standards/mets/METSOverview.v2.html: "Valid values for the MDTYPE element [in mdSec] include...DDI (Data Documentation Initiative), FGDC (Federal Geographic Data Committee Metadata Standard [FGDC-STD-001-1998]."
- In the Archivematica METS file, a DDI or FGDC file could be referenced from the mdSec using mdRef, for example as follows: <mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>.
METS and other metadata standards
- Other metadata standards that could be used for ingested datasets include:
- North American Profile (NAP) of ISO 19119, for geospatial metadata: http://www.fgdc.gov/metadata/geospatial-metadata-standards
- SDMX for aggregate data: http://sdmx.org/?page_id=10
- EML, the Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/index.html
- If these standards are used, the mdRef in the METS file would need to use OTHER as MDTYPE, for example: <mdRef LABEL="CCRI-CDN-Census1911V20110628.xml-73b93b28-be1b-433f-861e-03bc321dfe7e" xlink:href="metadata/CCRI-CDN-Census1911V20110628.xml" MDTYPE="OTHER" OTHERMDTYPE="SDMX" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
Validating DDI and FGDC on ingest
- Since DDI and FGDC are complex schemas and some manipulation may be required by the archivist prior to ingest, it may be useful to add metadata validation as a transfer or ingest micro-service, eg using xmllint.
- Sample validation command: xmllint --schema ddi:instance:3_1 metadata/CCRI-CDN-Census1911V20110628.xml.