|
|
Line 38: |
Line 38: |
| == Transfer METS file == | | == Transfer METS file == |
| | | |
− | <pre><?xml version='1.0' encoding='ASCII'?>
| + | |
− | <mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink"
| + | [[File:METS1G.png|800px|thumb|center]] |
− | xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version18/mets.xsd">
| + | [[File:METS2G.png|800px|thumb|center]] |
− | <mets:metsHdr CREATEDATE="2015-08-21T23:08:27"/>
| |
− | <mets:dmdSec ID="dmdSec_1">
| |
− | <mets:mdWrap MDTYPE="DC">
| |
− | <mets:xmlData>
| |
− | <dcterms:dublincore xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/"
| |
− | xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd">
| |
− | <dc:title>Pacific weather patterns study version 1.0</dc:title>
| |
− | <dc:creator>Doe, Jane</dc:creator>
| |
− | <dc:publisher>SP Dataverse Network</dc:publisher>
| |
− | <dc:identifier>hdl:10864/10125</dc:identifier>
| |
− | <dc:rights>Rights field</dc:rights>
| |
− | </dcterms:dublincore>
| |
− | </mets:xmlData>
| |
− | </mets:mdWrap>
| |
− | </mets:dmdSec>
| |
− | <mets:dmdSec ID="dmdSec_2">
| |
− | <mets:mdRef LABEL="dataset.json" xlink:href="location/dataset.json" MDTYPE="OTHER" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:dmdSec>
| |
− | <mets:dmdSec ID="dmdSec_3">
| |
− | <mets:mdRef LABEL="YVR_weather_data-ddi.xml" xlink:href="location/YVR weather data/YVR_weather_data-ddi.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:dmdSec>
| |
− | <mets:fileSec>
| |
− | <mets:fileGrp USE="original">
| |
− | <mets:file ID="YVR_weather_data.sav" CHECKSUM="fcf541085b0466c40f409037ea20a456" CHECKSUMTYPE="MD5">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_data.sav" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | <mets:file ID="Study_info.pdf" CHECKSUM="caaf89827e44d2de79a0b3112957fffd" CHECKSUMTYPE="MD5">
| |
− | <mets:FLocat xlink:href="location/Study_info.pdf" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | </mets:fileGrp>
| |
− | <mets:fileGrp USE="derivative">
| |
− | <mets:file ID="YVR_weather_data_utf8.tab" CHECKSUM="27a68be26ea4e370270299a996173380" CHECKSUMTYPE="MD5">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_data_utf8.tab" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | <mets:file ID="YVR_weather_data_utf8.rData" CHECKSUM="g8a68be26ea4e379770299a996171756" CHECKSUMTYPE="MD5">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_data_utf8.rData" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | </mets:fileGrp>
| |
− | <mets:fileGrp USE="metadata">
| |
− | <mets:file ID="dataset.json">
| |
− | <mets:FLocat xlink:href="location/dataset.json" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | <mets:file ID="YVR_weather_data-ddi.xml">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_data-ddi.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | <mets:file ID="YVR_weather_datacitation-endnote.xml">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_datacitation-endnote.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | <mets:file ID="YVR_weather_datacitation-ris.ris">
| |
− | <mets:FLocat xlink:href="location/YVR weather data/YVR_weather_datacitation-ris.ris" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
| |
− | </mets:file>
| |
− | </mets:fileGrp>
| |
− | </mets:fileSec>
| |
− | <mets:structMap ID="structMap_1" LABEL="Archivematica transfer" TYPE="physical">
| |
− | <mets:div LABEL="Pacific weather patterns study" TYPE="Directory" DMDID="dmdSec_1 dmdSec_2">
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="Study_info.pdf"/>
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="dataset.json"/>
| |
− | </mets:div>
| |
− | </mets:div>
| |
− | <mets:div LABEL="YVR weather data" TYPE="Directory">
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="YVR_weather_data.sav"/>
| |
− | </mets:div>
| |
− | <mets:div DMDID="dmdSec_3">
| |
− | <mets:fptr FILEID="YVR_weather_data_utf8.tab"/>
| |
− | </mets:div>
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="YVR_weather_data-ddi.xml"/>
| |
− | </mets:div>
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="YVR_weather_datacitation-endnote.xml"/>
| |
− | </mets:div>
| |
− | <mets:div>
| |
− | <mets:fptr FILEID="YVR_weather_datacitation-ris.ris"/>
| |
− | </mets:div>
| |
− | </mets:div>
| |
− | </mets:div>
| |
− | </mets:structMap>
| |
− | </mets:mets>
| |
− | </pre>
| |
Main Page > Documentation > Requirements > Dataverse
This page tracks development of a proof of concept integration of Archivematica with Dataverse.
See also
Overview
This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.
Workflow
- The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, preparing standard Archivematica Archival Information Packages (AIPs) and placing them in archival storage, and updating the Dataverse study with the AIP UUIDs.
- Analysis is based on Dataverse tests using https://apitest.dataverse.org/ and https://dataverse-demo.iq.harvard.edu/, online documentation at http://guides.dataverse.org/en/latest/api/index.html and discussions with Dataverse developers and users.
- Proposed integration is for Archivematica 1.5 and higher and Dataverse 4.x.
Workflow diagram
Workflow diagram notes
[1] "Ingest script" refers to an automation tool designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file ("bundled") content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.
Transfer METS file