Difference between revisions of "Dataverse"

From Archivematica
Jump to navigation Jump to search
Line 10: Line 10:
 
==Workflow==
 
==Workflow==
  
 +
===Workflow diagram===
  
 
[[File:Dataverse-Archivematica_workflow.png|800px|thumb|center]]
 
[[File:Dataverse-Archivematica_workflow.png|800px|thumb|center]]
 +
 +
==Workflow diagram notes==
 +
 +
[1] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.
 +
 +
[2] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.
 +
 +
[3] If json file has content_type of tab separated values, Archivematica issues API call for multiple file ("bundled") content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.
 +
 +
[4] Standard and pre-configured micro-services to include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.
 +
 +
[5] DC metadata parsed for the study only, not for individual data files.

Revision as of 16:29, 13 August 2015

Main Page > Documentation > Requirements > Dataverse

This page tracks development of a proof of concept integration of Archivematica with Dataverse.

See also

Workflow

Workflow diagram

Dataverse-Archivematica workflow.png

Workflow diagram notes

[1] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.

[2] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.

[3] If json file has content_type of tab separated values, Archivematica issues API call for multiple file ("bundled") content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.

[4] Standard and pre-configured micro-services to include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.

[5] DC metadata parsed for the study only, not for individual data files.