Difference between revisions of "Dataverse"

From Archivematica
Jump to navigation Jump to search
Line 12: Line 12:
  
 
==Workflow==
 
==Workflow==
 +
The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, proparing standard Archivematica Archival Information Packages (AIps) and placing them in archival storage, and updating the Dataverse study with the AIP UUID.
  
 
===Workflow diagram===
 
===Workflow diagram===

Revision as of 15:37, 13 August 2015

Main Page > Documentation > Requirements > Dataverse

This page tracks development of a proof of concept integration of Archivematica with Dataverse.

See also

Overview

This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.

Workflow

The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, proparing standard Archivematica Archival Information Packages (AIps) and placing them in archival storage, and updating the Dataverse study with the AIP UUID.

Workflow diagram

Dataverse-Archivematica workflow.png

Workflow diagram notes

[1] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.

[2] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.

[3] If json file has content_type of tab separated values, Archivematica issues API call for multiple file ("bundled") content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.

[4] Standard and pre-configured micro-services to include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.

[5] DC metadata parsed for the study only, not for individual data files.