<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.archivematica.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Joel-simpson</id>
	<title>Archivematica - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.archivematica.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Joel-simpson"/>
	<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/Special:Contributions/Joel-simpson"/>
	<updated>2026-06-03T06:06:38Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.4</generator>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12725</id>
		<title>Archivematica 1.8 and Storage Service 0.13 release notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12725"/>
		<updated>2018-10-24T20:09:17Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Dataverse integration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main_Page|Home]] &amp;gt; [[Release_Notes|Release Notes]] &amp;gt; Major release notes template&lt;br /&gt;
&lt;br /&gt;
'''Work in progress'''&lt;br /&gt;
&lt;br /&gt;
==Supported environments==&lt;br /&gt;
&lt;br /&gt;
Link to installation instructions.&lt;br /&gt;
&lt;br /&gt;
Specify supported environments.&lt;br /&gt;
&lt;br /&gt;
Make special note of any changes to supported environment.&lt;br /&gt;
&lt;br /&gt;
==Added==&lt;br /&gt;
&lt;br /&gt;
Describe new features.&lt;br /&gt;
&lt;br /&gt;
===New feature template===&lt;br /&gt;
&lt;br /&gt;
This is a description of this amazing feature! Here's why it's a net benefit to the project and the community. Also included are any special notes, like if it's a beta feature.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by some amazing institution. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: link&lt;br /&gt;
* Pull requests: link&lt;br /&gt;
&lt;br /&gt;
===Dataverse integration===&lt;br /&gt;
&lt;br /&gt;
Archivematica can now be configured to use a [https://dataverse.org/ Dataverse] research data repository as a transfer source location. Dataverse transfer source locations can be configured to display all available datasets or a subset of them. Datasets are retrieved directly using the Dataverse API and processed using a new “Dataverse” transfer type. New dataverse specific processing includes:&lt;br /&gt;
&lt;br /&gt;
* fixity checking using checksums generated by dataverse&lt;br /&gt;
* retrieval of derivative and metadata files associated with tabular data files&lt;br /&gt;
* creation of a Dataverse METS file describing the dataset as retrieved from Dataverse&lt;br /&gt;
* Dataverse metadata included in the AIP METS&lt;br /&gt;
&lt;br /&gt;
Some advanced or more complex use cases are not fully supported, such as handling of datasets with restricted files, versioning of datasets and reingest of datasets. For a full list of known issues and enhancement ideas, refer to the [https://github.com/archivematica/Issues/labels/OCUL%3A%20AM-Dataverse Archivematica issues repository using Dataverse label] and the [https://wiki.archivematica.org/Dataverse Archivematica Wiki].   &lt;br /&gt;
&lt;br /&gt;
This work was sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: See [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse waffle board] for all issues with the Dataverse Label. &lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/dataverse/ Dataverse Integration]&lt;br /&gt;
&lt;br /&gt;
===Processing configuration reset and download buttons===&lt;br /&gt;
&lt;br /&gt;
A new installation of Archivematica comes with a pre-set processing configuration called &amp;quot;default&amp;quot;, and a second one (used only in Jisc workflows) called &amp;quot;automated&amp;quot;. In testing, users are encouraged to change the configurations to suit their workflows, but may need to reset the configuration to the installation pre-sets. A reset button has been added so that users can easily change the default and automated processing configurations back to their installation pre-sets.&lt;br /&gt;
&lt;br /&gt;
The second part of this feature is the addition of a download button for the processing configuration files. If you create a custom processing configuration, you can download the resulting processingMCP.xml file using the button and then include it at the top level of your transfer. Archivematica will then use this to automate your transfer selections, rather than the default configuration.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration documentation]&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1138 #1138]&lt;br /&gt;
&lt;br /&gt;
==Changed==&lt;br /&gt;
&lt;br /&gt;
Describe enhancements or major fixes.&lt;br /&gt;
&lt;br /&gt;
===Streamline checksum verification===&lt;br /&gt;
&lt;br /&gt;
This enhancement de-duplicates checksum verification in Archivematica, which helps to improve the performance of Archivematica in processing large transfers (many files and/or large files). This enhancement includes three changes:&lt;br /&gt;
&lt;br /&gt;
* Remove the &amp;quot;Verify checksums generated on ingest&amp;quot; micro-service&lt;br /&gt;
* Enhance the &amp;quot;Verify AIP&amp;quot; micro-service to bulk query the database for transfer-generated checksums and then verify that they match what is documented in the bag-generated manifest-&amp;lt;ALGORITHM&amp;gt;.txt.&lt;br /&gt;
* Have &amp;quot;Verify AIP&amp;quot; create an AIP-level &amp;quot;fixity check&amp;quot; PREMIS:EVENT that it can pass to the Storage Service, which will document this verification in the pointer file.&lt;br /&gt;
&lt;br /&gt;
This should not impact regular workflows, but it is worth noting that there is no AIP-level fixity check PREMIS event for uncompressed AIPs, which don't have pointer files. For uncompressed AIPs, there are still object-level fixity events in place. Note that there is an issue in the Archivematica Issues repository regarding this note - [https://github.com/artefactual/archivematica-storage-service/issues/324 Problem: uncompressed AIPs need pointer files #32]&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Columbia University Library. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/918 #918]&lt;br /&gt;
* Pull requests: [https://github.com/artefactual/archivematica/pull/1012 PR 1012]&lt;br /&gt;
&lt;br /&gt;
===File format identification updates===&lt;br /&gt;
&lt;br /&gt;
Archivematica 1.8 is now up to date with PRONOM v.94! For more information on new data added to PRONOM, check the [http://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml PRONOM release notes].&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by the Denver Art Museum. Thank you!&lt;br /&gt;
&lt;br /&gt;
===Indexing can be enabled/disabled for Transfers and/or Archival Storage===&lt;br /&gt;
&lt;br /&gt;
Previously, the ElasticSearch index feature could be disabled globally as a scalability measure since indexing consumes a lot of resources. However, this also disabled Backlog and Appraisal features (which also uses indexing) and which some users still wanted to access. As of release 1.8, Archivematica can be deployed to run with indexing enabled just for Transfers (Backlog and Appraisal enabled), just for Archival Storage (Backlog and Appraisal disabled), for both indexes, or for none.&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1172 1172]&lt;br /&gt;
&lt;br /&gt;
===Configure email settings===&lt;br /&gt;
&lt;br /&gt;
This change improves the ways that the email client in Archivematica can be configured, including allowing an administrator to set the sender email address for emails sent by Archivematica (i.e. normalization reports, failure reports) to comply with local IT requirements.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica-docs/pull/208&lt;br /&gt;
* Documentation: [https://www.archivematica.org/docs/archivematica-1.8/admin-manual/installation-setup/customization/customization/#email-notification-configuration Email notification configuration]&lt;br /&gt;
&lt;br /&gt;
===Download processing configuration and reset to default===&lt;br /&gt;
&lt;br /&gt;
Previous versions of Archivematica introduced the ability to add custom processing configurations, but users had to retrieve the custom configuration file via the command line to use it. There is now a download button on Administration &amp;gt; Processing configuration so that you can download the processing config from the user interface.&lt;br /&gt;
&lt;br /&gt;
You can also reset a processing configuration to the installation pre-set by clicking on the new reset button on Administration &amp;gt; Processing configuration.&lt;br /&gt;
&lt;br /&gt;
The documentation for using a custom processing configuration has also been updated.&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/1138, https://github.com/artefactual/archivematica/issues/800&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration (user manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#processing-configuration Processing configuration (administrator manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#using-a-custom-processing-configuration-file Using a custom processing configuration file]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===MCP Batching for scalability &amp;amp; performance===&lt;br /&gt;
&lt;br /&gt;
This feature refactors how tasks are scheduled, executed &amp;amp; managed within Archivematica, by grouping tasks into batches. It introduces processing efficiencies that significantly decrease the processing power and time required to complete Transfer and Ingest. It includes new configuration options to further optimize processing efficiency for particular types of Transfers (e.g. few large files vs. many small files) and for different deployment patterns (e.g. installing components across multiple machines). &lt;br /&gt;
&lt;br /&gt;
This feature does not impact the functionality or appearance of Archivematica.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!  &lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/938&lt;br /&gt;
* Documentation: Scaling Architematica [update with link when PR 182 is merged];&lt;br /&gt;
&lt;br /&gt;
==Fixed==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/16 ASCII codes can't decode when the filename contains a backtick]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/42 AIP re-ingest fails]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/43 PREMIS events from previous transfers are re-appearing]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1132 Metadata reingest fails when dc:type is null]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/46 Use 7-zip without compression (Copy) mode]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/140 Metadata added before &amp;quot;Approve Transfer&amp;quot; disappears]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/173 Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1104 Zip files with diacritic characters are failing to extract]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/123 DSpace REST login error in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/124 Unable to edit DSpace REST Space settings in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/220 restructureBagForComplianceFileUUIDsAssigned needs to create intermediate directories for Zipped bag transfers] - '''Community contribution''' by Hillel Arnold. Thank you!&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1050 Ingest fails if Archivematica isn't connected to the Internet]&lt;br /&gt;
&lt;br /&gt;
==Upgraded tools and dependencies==&lt;br /&gt;
&lt;br /&gt;
* Fido has been upgraded to version 1.3.12&lt;br /&gt;
* Siegfried has been upgraded to version 1.7.10&lt;br /&gt;
* FITS has been upgraded to version 1.1.0&lt;br /&gt;
&lt;br /&gt;
==End of life dependencies==&lt;br /&gt;
&lt;br /&gt;
===Archivists' Toolkit integration===&lt;br /&gt;
&lt;br /&gt;
Archivists' Toolkit has been deprecated since 2013. The Archivists' Toolkit DIP upload feature has not had active development or testing since then. There are no plans to start testing or to fix any problems with the feature. As a result, there is a [https://github.com/archivematica/Issues/issues/174 proposal deprecate this feature in Archivematica 1.9]. Community response is welcome via a comment on the issue in GitHub.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12724</id>
		<title>Archivematica 1.8 and Storage Service 0.13 release notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12724"/>
		<updated>2018-10-24T19:54:50Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Dataverse Integration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main_Page|Home]] &amp;gt; [[Release_Notes|Release Notes]] &amp;gt; Major release notes template&lt;br /&gt;
&lt;br /&gt;
'''Work in progress'''&lt;br /&gt;
&lt;br /&gt;
==Supported environments==&lt;br /&gt;
&lt;br /&gt;
Link to installation instructions.&lt;br /&gt;
&lt;br /&gt;
Specify supported environments.&lt;br /&gt;
&lt;br /&gt;
Make special note of any changes to supported environment.&lt;br /&gt;
&lt;br /&gt;
==Added==&lt;br /&gt;
&lt;br /&gt;
Describe new features.&lt;br /&gt;
&lt;br /&gt;
===New feature template===&lt;br /&gt;
&lt;br /&gt;
This is a description of this amazing feature! Here's why it's a net benefit to the project and the community. Also included are any special notes, like if it's a beta feature.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by some amazing institution. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: link&lt;br /&gt;
* Pull requests: link&lt;br /&gt;
&lt;br /&gt;
===Dataverse integration===&lt;br /&gt;
&lt;br /&gt;
Archivematica can now be configured to use a [https://dataverse.org/ Dataverse] research data repository as a transfer source location. Dataverse transfer source locations can be configured to display all available datasets or a subset of them. Datasets are retrieved directly using the Dataverse API and processed using a new “Dataverse” transfer type. New dataverse specific processing includes:&lt;br /&gt;
&lt;br /&gt;
* fixity checking using checksums generated by dataverse&lt;br /&gt;
* retrieval of derivative and metadata files associated with tabular data files&lt;br /&gt;
* creation of a Dataverse METS file describing the dataset as retrieved from Dataverse&lt;br /&gt;
* Dataverse metadata included in the AIP METS&lt;br /&gt;
&lt;br /&gt;
Dataverse integration is currently a “Beta” feature. Some advanced or more complex use cases are not fully supported, such as handling of datasets with restricted files, versioning of datasets and reingest of datasets. For a full list of known issues and enhancement ideas, refer to the [https://github.com/archivematica/Issues/labels/OCUL%3A%20AM-Dataverse Archivematica issues repository using Dataverse label] and the [https://wiki.archivematica.org/Dataverse Archivematica Wiki].   &lt;br /&gt;
&lt;br /&gt;
This work was sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: See [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse waffle board] for all issues with the Dataverse Label. &lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/dataverse/ Dataverse Integration]&lt;br /&gt;
&lt;br /&gt;
===Processing configuration reset and download buttons===&lt;br /&gt;
&lt;br /&gt;
A new installation of Archivematica comes with a pre-set processing configuration called &amp;quot;default&amp;quot;, and a second one (used only in Jisc workflows) called &amp;quot;automated&amp;quot;. In testing, users are encouraged to change the configurations to suit their workflows, but may need to reset the configuration to the installation pre-sets. A reset button has been added so that users can easily change the default and automated processing configurations back to their installation pre-sets.&lt;br /&gt;
&lt;br /&gt;
The second part of this feature is the addition of a download button for the processing configuration files. If you create a custom processing configuration, you can download the resulting processingMCP.xml file using the button and then include it at the top level of your transfer. Archivematica will then use this to automate your transfer selections, rather than the default configuration.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration documentation]&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1138 #1138]&lt;br /&gt;
&lt;br /&gt;
==Changed==&lt;br /&gt;
&lt;br /&gt;
Describe enhancements or major fixes.&lt;br /&gt;
&lt;br /&gt;
===Streamline checksum verification===&lt;br /&gt;
&lt;br /&gt;
This enhancement de-duplicates checksum verification in Archivematica, which helps to improve the performance of Archivematica in processing large transfers (many files and/or large files). This enhancement includes three changes:&lt;br /&gt;
&lt;br /&gt;
* Remove the &amp;quot;Verify checksums generated on ingest&amp;quot; micro-service&lt;br /&gt;
* Enhance the &amp;quot;Verify AIP&amp;quot; micro-service to bulk query the database for transfer-generated checksums and then verify that they match what is documented in the bag-generated manifest-&amp;lt;ALGORITHM&amp;gt;.txt.&lt;br /&gt;
* Have &amp;quot;Verify AIP&amp;quot; create an AIP-level &amp;quot;fixity check&amp;quot; PREMIS:EVENT that it can pass to the Storage Service, which will document this verification in the pointer file.&lt;br /&gt;
&lt;br /&gt;
This should not impact regular workflows, but it is worth noting that there is no AIP-level fixity check PREMIS event for uncompressed AIPs, which don't have pointer files. For uncompressed AIPs, there are still object-level fixity events in place. Note that there is an issue in the Archivematica Issues repository regarding this note - [https://github.com/artefactual/archivematica-storage-service/issues/324 Problem: uncompressed AIPs need pointer files #32]&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Columbia University Library. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/918 #918]&lt;br /&gt;
* Pull requests: [https://github.com/artefactual/archivematica/pull/1012 PR 1012]&lt;br /&gt;
&lt;br /&gt;
===File format identification updates===&lt;br /&gt;
&lt;br /&gt;
Archivematica 1.8 is now up to date with PRONOM v.94! For more information on new data added to PRONOM, check the [http://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml PRONOM release notes].&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by the Denver Art Museum. Thank you!&lt;br /&gt;
&lt;br /&gt;
===Indexing can be enabled/disabled for Transfers and/or Archival Storage===&lt;br /&gt;
&lt;br /&gt;
Previously, the ElasticSearch index feature could be disabled globally as a scalability measure since indexing consumes a lot of resources. However, this also disabled Backlog and Appraisal features (which also uses indexing) and which some users still wanted to access. As of release 1.8, Archivematica can be deployed to run with indexing enabled just for Transfers (Backlog and Appraisal enabled), just for Archival Storage (Backlog and Appraisal disabled), for both indexes, or for none.&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1172 1172]&lt;br /&gt;
&lt;br /&gt;
===Configure email settings===&lt;br /&gt;
&lt;br /&gt;
This change improves the ways that the email client in Archivematica can be configured, including allowing an administrator to set the sender email address for emails sent by Archivematica (i.e. normalization reports, failure reports) to comply with local IT requirements.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica-docs/pull/208&lt;br /&gt;
* Documentation: [https://www.archivematica.org/docs/archivematica-1.8/admin-manual/installation-setup/customization/customization/#email-notification-configuration Email notification configuration]&lt;br /&gt;
&lt;br /&gt;
===Download processing configuration and reset to default===&lt;br /&gt;
&lt;br /&gt;
Previous versions of Archivematica introduced the ability to add custom processing configurations, but users had to retrieve the custom configuration file via the command line to use it. There is now a download button on Administration &amp;gt; Processing configuration so that you can download the processing config from the user interface.&lt;br /&gt;
&lt;br /&gt;
You can also reset a processing configuration to the installation pre-set by clicking on the new reset button on Administration &amp;gt; Processing configuration.&lt;br /&gt;
&lt;br /&gt;
The documentation for using a custom processing configuration has also been updated.&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/1138, https://github.com/artefactual/archivematica/issues/800&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration (user manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#processing-configuration Processing configuration (administrator manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#using-a-custom-processing-configuration-file Using a custom processing configuration file]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===MCP Batching for scalability &amp;amp; performance===&lt;br /&gt;
&lt;br /&gt;
This feature refactors how tasks are scheduled, executed &amp;amp; managed within Archivematica, by grouping tasks into batches. It introduces processing efficiencies that significantly decrease the processing power and time required to complete Transfer and Ingest. It includes new configuration options to further optimize processing efficiency for particular types of Transfers (e.g. few large files vs. many small files) and for different deployment patterns (e.g. installing components across multiple machines). &lt;br /&gt;
&lt;br /&gt;
This feature does not impact the functionality or appearance of Archivematica.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!  &lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/938&lt;br /&gt;
* Documentation: Scaling Architematica [update with link when PR 182 is merged];&lt;br /&gt;
&lt;br /&gt;
==Fixed==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/16 ASCII codes can't decode when the filename contains a backtick]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/42 AIP re-ingest fails]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/43 PREMIS events from previous transfers are re-appearing]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1132 Metadata reingest fails when dc:type is null]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/46 Use 7-zip without compression (Copy) mode]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/140 Metadata added before &amp;quot;Approve Transfer&amp;quot; disappears]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/173 Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1104 Zip files with diacritic characters are failing to extract]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/123 DSpace REST login error in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/124 Unable to edit DSpace REST Space settings in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/220 restructureBagForComplianceFileUUIDsAssigned needs to create intermediate directories for Zipped bag transfers] - '''Community contribution''' by Hillel Arnold. Thank you!&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1050 Ingest fails if Archivematica isn't connected to the Internet]&lt;br /&gt;
&lt;br /&gt;
==Upgraded tools and dependencies==&lt;br /&gt;
&lt;br /&gt;
* Fido has been upgraded to version 1.3.12&lt;br /&gt;
* Siegfried has been upgraded to version 1.7.10&lt;br /&gt;
* FITS has been upgraded to version 1.1.0&lt;br /&gt;
&lt;br /&gt;
==End of life dependencies==&lt;br /&gt;
&lt;br /&gt;
===Archivists' Toolkit integration===&lt;br /&gt;
&lt;br /&gt;
Archivists' Toolkit has been deprecated since 2013. The Archivists' Toolkit DIP upload feature has not had active development or testing since then. There are no plans to start testing or to fix any problems with the feature. As a result, there is a [https://github.com/archivematica/Issues/issues/174 proposal deprecate this feature in Archivematica 1.9]. Community response is welcome via a comment on the issue in GitHub.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12723</id>
		<title>Archivematica 1.8 and Storage Service 0.13 release notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12723"/>
		<updated>2018-10-24T19:53:28Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Dataverse Integration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main_Page|Home]] &amp;gt; [[Release_Notes|Release Notes]] &amp;gt; Major release notes template&lt;br /&gt;
&lt;br /&gt;
'''Work in progress'''&lt;br /&gt;
&lt;br /&gt;
==Supported environments==&lt;br /&gt;
&lt;br /&gt;
Link to installation instructions.&lt;br /&gt;
&lt;br /&gt;
Specify supported environments.&lt;br /&gt;
&lt;br /&gt;
Make special note of any changes to supported environment.&lt;br /&gt;
&lt;br /&gt;
==Added==&lt;br /&gt;
&lt;br /&gt;
Describe new features.&lt;br /&gt;
&lt;br /&gt;
===New feature template===&lt;br /&gt;
&lt;br /&gt;
This is a description of this amazing feature! Here's why it's a net benefit to the project and the community. Also included are any special notes, like if it's a beta feature.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by some amazing institution. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: link&lt;br /&gt;
* Pull requests: link&lt;br /&gt;
&lt;br /&gt;
===Dataverse Integration===&lt;br /&gt;
&lt;br /&gt;
Archivematica can now be configured to use a [https://dataverse.org/ Dataverse] research data repository as a Transfer source location. Dataverse transfer source locations can be configured to display all available datasets or a subset of them. Datasets are retrieved directly using the Dataverse API and processed using a new “Dataverse” transfer type. New dataverse specific processing includes:&lt;br /&gt;
&lt;br /&gt;
* fixity checking using checksums generated by dataverse&lt;br /&gt;
* retrieval of derivative and metadata files associated with tabular data files&lt;br /&gt;
* creation of a Dataverse METS file describing the dataset as retrieved from Dataverse&lt;br /&gt;
* Dataverse metadata included in the AIP METS&lt;br /&gt;
&lt;br /&gt;
Dataverse integration is currently a “Beta” feature. Some advanced or more complex use cases are not fully supported, such as handling of datasets with restricted files, versioning of datasets and reingest of datasets. For a full list of known issues and enhancement ideas, refer to the [https://github.com/archivematica/Issues/labels/OCUL%3A%20AM-Dataverse Archivematica issues repository using Dataverse label] and the [https://wiki.archivematica.org/Dataverse Archivematica Wiki].   &lt;br /&gt;
&lt;br /&gt;
This work was sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: See [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse waffle board] for all issues with the Dataverse Label. &lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/dataverse/ Dataverse Integration]&lt;br /&gt;
&lt;br /&gt;
===Processing configuration reset and download buttons===&lt;br /&gt;
&lt;br /&gt;
A new installation of Archivematica comes with a pre-set processing configuration called &amp;quot;default&amp;quot;, and a second one (used only in Jisc workflows) called &amp;quot;automated&amp;quot;. In testing, users are encouraged to change the configurations to suit their workflows, but may need to reset the configuration to the installation pre-sets. A reset button has been added so that users can easily change the default and automated processing configurations back to their installation pre-sets.&lt;br /&gt;
&lt;br /&gt;
The second part of this feature is the addition of a download button for the processing configuration files. If you create a custom processing configuration, you can download the resulting processingMCP.xml file using the button and then include it at the top level of your transfer. Archivematica will then use this to automate your transfer selections, rather than the default configuration.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration documentation]&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1138 #1138]&lt;br /&gt;
&lt;br /&gt;
==Changed==&lt;br /&gt;
&lt;br /&gt;
Describe enhancements or major fixes.&lt;br /&gt;
&lt;br /&gt;
===Streamline checksum verification===&lt;br /&gt;
&lt;br /&gt;
This enhancement de-duplicates checksum verification in Archivematica, which helps to improve the performance of Archivematica in processing large transfers (many files and/or large files). This enhancement includes three changes:&lt;br /&gt;
&lt;br /&gt;
* Remove the &amp;quot;Verify checksums generated on ingest&amp;quot; micro-service&lt;br /&gt;
* Enhance the &amp;quot;Verify AIP&amp;quot; micro-service to bulk query the database for transfer-generated checksums and then verify that they match what is documented in the bag-generated manifest-&amp;lt;ALGORITHM&amp;gt;.txt.&lt;br /&gt;
* Have &amp;quot;Verify AIP&amp;quot; create an AIP-level &amp;quot;fixity check&amp;quot; PREMIS:EVENT that it can pass to the Storage Service, which will document this verification in the pointer file.&lt;br /&gt;
&lt;br /&gt;
This should not impact regular workflows, but it is worth noting that there is no AIP-level fixity check PREMIS event for uncompressed AIPs, which don't have pointer files. For uncompressed AIPs, there are still object-level fixity events in place. Note that there is an issue in the Archivematica Issues repository regarding this note - [https://github.com/artefactual/archivematica-storage-service/issues/324 Problem: uncompressed AIPs need pointer files #32]&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Columbia University Library. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/918 #918]&lt;br /&gt;
* Pull requests: [https://github.com/artefactual/archivematica/pull/1012 PR 1012]&lt;br /&gt;
&lt;br /&gt;
===File format identification updates===&lt;br /&gt;
&lt;br /&gt;
Archivematica 1.8 is now up to date with PRONOM v.94! For more information on new data added to PRONOM, check the [http://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml PRONOM release notes].&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by the Denver Art Museum. Thank you!&lt;br /&gt;
&lt;br /&gt;
===Indexing can be enabled/disabled for Transfers and/or Archival Storage===&lt;br /&gt;
&lt;br /&gt;
Previously, the ElasticSearch index feature could be disabled globally as a scalability measure since indexing consumes a lot of resources. However, this also disabled Backlog and Appraisal features (which also uses indexing) and which some users still wanted to access. As of release 1.8, Archivematica can be deployed to run with indexing enabled just for Transfers (Backlog and Appraisal enabled), just for Archival Storage (Backlog and Appraisal disabled), for both indexes, or for none.&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1172 1172]&lt;br /&gt;
&lt;br /&gt;
===Configure email settings===&lt;br /&gt;
&lt;br /&gt;
This change improves the ways that the email client in Archivematica can be configured, including allowing an administrator to set the sender email address for emails sent by Archivematica (i.e. normalization reports, failure reports) to comply with local IT requirements.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica-docs/pull/208&lt;br /&gt;
* Documentation: [https://www.archivematica.org/docs/archivematica-1.8/admin-manual/installation-setup/customization/customization/#email-notification-configuration Email notification configuration]&lt;br /&gt;
&lt;br /&gt;
===Download processing configuration and reset to default===&lt;br /&gt;
&lt;br /&gt;
Previous versions of Archivematica introduced the ability to add custom processing configurations, but users had to retrieve the custom configuration file via the command line to use it. There is now a download button on Administration &amp;gt; Processing configuration so that you can download the processing config from the user interface.&lt;br /&gt;
&lt;br /&gt;
You can also reset a processing configuration to the installation pre-set by clicking on the new reset button on Administration &amp;gt; Processing configuration.&lt;br /&gt;
&lt;br /&gt;
The documentation for using a custom processing configuration has also been updated.&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/1138, https://github.com/artefactual/archivematica/issues/800&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration (user manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#processing-configuration Processing configuration (administrator manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#using-a-custom-processing-configuration-file Using a custom processing configuration file]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===MCP Batching for scalability &amp;amp; performance===&lt;br /&gt;
&lt;br /&gt;
This feature refactors how tasks are scheduled, executed &amp;amp; managed within Archivematica, by grouping tasks into batches. It introduces processing efficiencies that significantly decrease the processing power and time required to complete Transfer and Ingest. It includes new configuration options to further optimize processing efficiency for particular types of Transfers (e.g. few large files vs. many small files) and for different deployment patterns (e.g. installing components across multiple machines). &lt;br /&gt;
&lt;br /&gt;
This feature does not impact the functionality or appearance of Archivematica.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!  &lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/938&lt;br /&gt;
* Documentation: Scaling Architematica [update with link when PR 182 is merged];&lt;br /&gt;
&lt;br /&gt;
==Fixed==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/16 ASCII codes can't decode when the filename contains a backtick]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/42 AIP re-ingest fails]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/43 PREMIS events from previous transfers are re-appearing]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1132 Metadata reingest fails when dc:type is null]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/46 Use 7-zip without compression (Copy) mode]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/140 Metadata added before &amp;quot;Approve Transfer&amp;quot; disappears]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/173 Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1104 Zip files with diacritic characters are failing to extract]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/123 DSpace REST login error in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/124 Unable to edit DSpace REST Space settings in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/220 restructureBagForComplianceFileUUIDsAssigned needs to create intermediate directories for Zipped bag transfers] - '''Community contribution''' by Hillel Arnold. Thank you!&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1050 Ingest fails if Archivematica isn't connected to the Internet]&lt;br /&gt;
&lt;br /&gt;
==Upgraded tools and dependencies==&lt;br /&gt;
&lt;br /&gt;
* Fido has been upgraded to version 1.3.12&lt;br /&gt;
* Siegfried has been upgraded to version 1.7.10&lt;br /&gt;
* FITS has been upgraded to version 1.1.0&lt;br /&gt;
&lt;br /&gt;
==End of life dependencies==&lt;br /&gt;
&lt;br /&gt;
===Archivists' Toolkit integration===&lt;br /&gt;
&lt;br /&gt;
Archivists' Toolkit has been deprecated since 2013. The Archivists' Toolkit DIP upload feature has not had active development or testing since then. There are no plans to start testing or to fix any problems with the feature. As a result, there is a [https://github.com/archivematica/Issues/issues/174 proposal deprecate this feature in Archivematica 1.9]. Community response is welcome via a comment on the issue in GitHub.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12722</id>
		<title>Archivematica 1.8 and Storage Service 0.13 release notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12722"/>
		<updated>2018-10-24T19:52:38Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Added */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main_Page|Home]] &amp;gt; [[Release_Notes|Release Notes]] &amp;gt; Major release notes template&lt;br /&gt;
&lt;br /&gt;
'''Work in progress'''&lt;br /&gt;
&lt;br /&gt;
==Supported environments==&lt;br /&gt;
&lt;br /&gt;
Link to installation instructions.&lt;br /&gt;
&lt;br /&gt;
Specify supported environments.&lt;br /&gt;
&lt;br /&gt;
Make special note of any changes to supported environment.&lt;br /&gt;
&lt;br /&gt;
==Added==&lt;br /&gt;
&lt;br /&gt;
Describe new features.&lt;br /&gt;
&lt;br /&gt;
===New feature template===&lt;br /&gt;
&lt;br /&gt;
This is a description of this amazing feature! Here's why it's a net benefit to the project and the community. Also included are any special notes, like if it's a beta feature.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by some amazing institution. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: link&lt;br /&gt;
* Pull requests: link&lt;br /&gt;
&lt;br /&gt;
===Dataverse Integration===&lt;br /&gt;
&lt;br /&gt;
Archivematica can now be configured to use a [https://dataverse.org/ Dataverse] research data repository as a Transfer source location. Dataverse transfer source locations can be configured to display all available datasets or a subset of them. Datasets are retrieved directly using the Dataverse API and processed using a new “Dataverse” transfer type. New dataverse specific processing includes:&lt;br /&gt;
&lt;br /&gt;
* fixity checking using checksums generated by dataverse&lt;br /&gt;
* retrieval of derivative and metadata files associated with tabular data files&lt;br /&gt;
* creation of a Dataverse METS file describing the dataset as retrieved from Dataverse&lt;br /&gt;
* Dataverse metadata included in the AIP METS&lt;br /&gt;
&lt;br /&gt;
Dataverse integration is currently a “Beta” feature. Some advanced or more complex use cases are not fully supported, such as handling of datasets with restricted files, versioning of datasets and reingest of datasets. For a full list of known issues and enhancement ideas, refer to the [https://github.com/archivematica/Issues/labels/OCUL%3A%20AM-Dataverse Archivematica issues repository using Dataverse label] and the [https://wiki.archivematica.org/Dataverse Archivematica Wiki].   &lt;br /&gt;
This feature has been sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: See [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse waffle board] for all issues with the Dataverse Label. &lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/dataverse/ Dataverse Integration]&lt;br /&gt;
&lt;br /&gt;
===Processing configuration reset and download buttons===&lt;br /&gt;
&lt;br /&gt;
A new installation of Archivematica comes with a pre-set processing configuration called &amp;quot;default&amp;quot;, and a second one (used only in Jisc workflows) called &amp;quot;automated&amp;quot;. In testing, users are encouraged to change the configurations to suit their workflows, but may need to reset the configuration to the installation pre-sets. A reset button has been added so that users can easily change the default and automated processing configurations back to their installation pre-sets.&lt;br /&gt;
&lt;br /&gt;
The second part of this feature is the addition of a download button for the processing configuration files. If you create a custom processing configuration, you can download the resulting processingMCP.xml file using the button and then include it at the top level of your transfer. Archivematica will then use this to automate your transfer selections, rather than the default configuration.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration documentation]&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1138 #1138]&lt;br /&gt;
&lt;br /&gt;
==Changed==&lt;br /&gt;
&lt;br /&gt;
Describe enhancements or major fixes.&lt;br /&gt;
&lt;br /&gt;
===Streamline checksum verification===&lt;br /&gt;
&lt;br /&gt;
This enhancement de-duplicates checksum verification in Archivematica, which helps to improve the performance of Archivematica in processing large transfers (many files and/or large files). This enhancement includes three changes:&lt;br /&gt;
&lt;br /&gt;
* Remove the &amp;quot;Verify checksums generated on ingest&amp;quot; micro-service&lt;br /&gt;
* Enhance the &amp;quot;Verify AIP&amp;quot; micro-service to bulk query the database for transfer-generated checksums and then verify that they match what is documented in the bag-generated manifest-&amp;lt;ALGORITHM&amp;gt;.txt.&lt;br /&gt;
* Have &amp;quot;Verify AIP&amp;quot; create an AIP-level &amp;quot;fixity check&amp;quot; PREMIS:EVENT that it can pass to the Storage Service, which will document this verification in the pointer file.&lt;br /&gt;
&lt;br /&gt;
This should not impact regular workflows, but it is worth noting that there is no AIP-level fixity check PREMIS event for uncompressed AIPs, which don't have pointer files. For uncompressed AIPs, there are still object-level fixity events in place. Note that there is an issue in the Archivematica Issues repository regarding this note - [https://github.com/artefactual/archivematica-storage-service/issues/324 Problem: uncompressed AIPs need pointer files #32]&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Columbia University Library. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/918 #918]&lt;br /&gt;
* Pull requests: [https://github.com/artefactual/archivematica/pull/1012 PR 1012]&lt;br /&gt;
&lt;br /&gt;
===File format identification updates===&lt;br /&gt;
&lt;br /&gt;
Archivematica 1.8 is now up to date with PRONOM v.94! For more information on new data added to PRONOM, check the [http://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml PRONOM release notes].&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by the Denver Art Museum. Thank you!&lt;br /&gt;
&lt;br /&gt;
===Indexing can be enabled/disabled for Transfers and/or Archival Storage===&lt;br /&gt;
&lt;br /&gt;
Previously, the ElasticSearch index feature could be disabled globally as a scalability measure since indexing consumes a lot of resources. However, this also disabled Backlog and Appraisal features (which also uses indexing) and which some users still wanted to access. As of release 1.8, Archivematica can be deployed to run with indexing enabled just for Transfers (Backlog and Appraisal enabled), just for Archival Storage (Backlog and Appraisal disabled), for both indexes, or for none.&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1172 1172]&lt;br /&gt;
&lt;br /&gt;
===Configure email settings===&lt;br /&gt;
&lt;br /&gt;
This change improves the ways that the email client in Archivematica can be configured, including allowing an administrator to set the sender email address for emails sent by Archivematica (i.e. normalization reports, failure reports) to comply with local IT requirements.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica-docs/pull/208&lt;br /&gt;
* Documentation: [https://www.archivematica.org/docs/archivematica-1.8/admin-manual/installation-setup/customization/customization/#email-notification-configuration Email notification configuration]&lt;br /&gt;
&lt;br /&gt;
===Download processing configuration and reset to default===&lt;br /&gt;
&lt;br /&gt;
Previous versions of Archivematica introduced the ability to add custom processing configurations, but users had to retrieve the custom configuration file via the command line to use it. There is now a download button on Administration &amp;gt; Processing configuration so that you can download the processing config from the user interface.&lt;br /&gt;
&lt;br /&gt;
You can also reset a processing configuration to the installation pre-set by clicking on the new reset button on Administration &amp;gt; Processing configuration.&lt;br /&gt;
&lt;br /&gt;
The documentation for using a custom processing configuration has also been updated.&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/1138, https://github.com/artefactual/archivematica/issues/800&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration (user manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#processing-configuration Processing configuration (administrator manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#using-a-custom-processing-configuration-file Using a custom processing configuration file]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===MCP Batching for scalability &amp;amp; performance===&lt;br /&gt;
&lt;br /&gt;
This feature refactors how tasks are scheduled, executed &amp;amp; managed within Archivematica, by grouping tasks into batches. It introduces processing efficiencies that significantly decrease the processing power and time required to complete Transfer and Ingest. It includes new configuration options to further optimize processing efficiency for particular types of Transfers (e.g. few large files vs. many small files) and for different deployment patterns (e.g. installing components across multiple machines). &lt;br /&gt;
&lt;br /&gt;
This feature does not impact the functionality or appearance of Archivematica.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!  &lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/938&lt;br /&gt;
* Documentation: Scaling Architematica [update with link when PR 182 is merged];&lt;br /&gt;
&lt;br /&gt;
==Fixed==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/16 ASCII codes can't decode when the filename contains a backtick]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/42 AIP re-ingest fails]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/43 PREMIS events from previous transfers are re-appearing]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1132 Metadata reingest fails when dc:type is null]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/46 Use 7-zip without compression (Copy) mode]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/140 Metadata added before &amp;quot;Approve Transfer&amp;quot; disappears]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/173 Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1104 Zip files with diacritic characters are failing to extract]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/123 DSpace REST login error in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/124 Unable to edit DSpace REST Space settings in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/220 restructureBagForComplianceFileUUIDsAssigned needs to create intermediate directories for Zipped bag transfers] - '''Community contribution''' by Hillel Arnold. Thank you!&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1050 Ingest fails if Archivematica isn't connected to the Internet]&lt;br /&gt;
&lt;br /&gt;
==Upgraded tools and dependencies==&lt;br /&gt;
&lt;br /&gt;
* Fido has been upgraded to version 1.3.12&lt;br /&gt;
* Siegfried has been upgraded to version 1.7.10&lt;br /&gt;
* FITS has been upgraded to version 1.1.0&lt;br /&gt;
&lt;br /&gt;
==End of life dependencies==&lt;br /&gt;
&lt;br /&gt;
===Archivists' Toolkit integration===&lt;br /&gt;
&lt;br /&gt;
Archivists' Toolkit has been deprecated since 2013. The Archivists' Toolkit DIP upload feature has not had active development or testing since then. There are no plans to start testing or to fix any problems with the feature. As a result, there is a [https://github.com/archivematica/Issues/issues/174 proposal deprecate this feature in Archivematica 1.9]. Community response is welcome via a comment on the issue in GitHub.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12721</id>
		<title>Archivematica 1.8 and Storage Service 0.13 release notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Archivematica_1.8_and_Storage_Service_0.13_release_notes&amp;diff=12721"/>
		<updated>2018-10-24T17:23:45Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Changed */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main_Page|Home]] &amp;gt; [[Release_Notes|Release Notes]] &amp;gt; Major release notes template&lt;br /&gt;
&lt;br /&gt;
'''Work in progress'''&lt;br /&gt;
&lt;br /&gt;
==Supported environments==&lt;br /&gt;
&lt;br /&gt;
Link to installation instructions.&lt;br /&gt;
&lt;br /&gt;
Specify supported environments.&lt;br /&gt;
&lt;br /&gt;
Make special note of any changes to supported environment.&lt;br /&gt;
&lt;br /&gt;
==Added==&lt;br /&gt;
&lt;br /&gt;
Describe new features.&lt;br /&gt;
&lt;br /&gt;
===New feature template===&lt;br /&gt;
&lt;br /&gt;
This is a description of this amazing feature! Here's why it's a net benefit to the project and the community. Also included are any special notes, like if it's a beta feature.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by some amazing institution. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: link&lt;br /&gt;
* Pull requests: link&lt;br /&gt;
&lt;br /&gt;
===Processing configuration reset and download buttons===&lt;br /&gt;
&lt;br /&gt;
A new installation of Archivematica comes with a pre-set processing configuration called &amp;quot;default&amp;quot;, and a second one (used only in Jisc workflows) called &amp;quot;automated&amp;quot;. In testing, users are encouraged to change the configurations to suit their workflows, but may need to reset the configuration to the installation pre-sets. A reset button has been added so that users can easily change the default and automated processing configurations back to their installation pre-sets.&lt;br /&gt;
&lt;br /&gt;
The second part of this feature is the addition of a download button for the processing configuration files. If you create a custom processing configuration, you can download the resulting processingMCP.xml file using the button and then include it at the top level of your transfer. Archivematica will then use this to automate your transfer selections, rather than the default configuration.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration documentation]&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1138 #1138]&lt;br /&gt;
&lt;br /&gt;
==Changed==&lt;br /&gt;
&lt;br /&gt;
Describe enhancements or major fixes.&lt;br /&gt;
&lt;br /&gt;
===Streamline checksum verification===&lt;br /&gt;
&lt;br /&gt;
This enhancement de-duplicates checksum verification in Archivematica, which helps to improve the performance of Archivematica in processing large transfers (many files and/or large files). This enhancement includes three changes:&lt;br /&gt;
&lt;br /&gt;
* Remove the &amp;quot;Verify checksums generated on ingest&amp;quot; micro-service&lt;br /&gt;
* Enhance the &amp;quot;Verify AIP&amp;quot; micro-service to bulk query the database for transfer-generated checksums and then verify that they match what is documented in the bag-generated manifest-&amp;lt;ALGORITHM&amp;gt;.txt.&lt;br /&gt;
* Have &amp;quot;Verify AIP&amp;quot; create an AIP-level &amp;quot;fixity check&amp;quot; PREMIS:EVENT that it can pass to the Storage Service, which will document this verification in the pointer file.&lt;br /&gt;
&lt;br /&gt;
This should not impact regular workflows, but it is worth noting that there is no AIP-level fixity check PREMIS event for uncompressed AIPs, which don't have pointer files. For uncompressed AIPs, there are still object-level fixity events in place. Note that there is an issue in the Archivematica Issues repository regarding this note - [https://github.com/artefactual/archivematica-storage-service/issues/324 Problem: uncompressed AIPs need pointer files #32]&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Columbia University Library. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/918 #918]&lt;br /&gt;
* Pull requests: [https://github.com/artefactual/archivematica/pull/1012 PR 1012]&lt;br /&gt;
&lt;br /&gt;
===File format identification updates===&lt;br /&gt;
&lt;br /&gt;
Archivematica 1.8 is now up to date with PRONOM v.94! For more information on new data added to PRONOM, check the [http://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml PRONOM release notes].&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by the Denver Art Museum. Thank you!&lt;br /&gt;
&lt;br /&gt;
===Indexing can be enabled/disabled for Transfers and/or Archival Storage===&lt;br /&gt;
&lt;br /&gt;
Previously, the ElasticSearch index feature could be disabled globally as a scalability measure since indexing consumes a lot of resources. However, this also disabled Backlog and Appraisal features (which also uses indexing) and which some users still wanted to access. As of release 1.8, Archivematica can be deployed to run with indexing enabled just for Transfers (Backlog and Appraisal enabled), just for Archival Storage (Backlog and Appraisal disabled), for both indexes, or for none.&lt;br /&gt;
&lt;br /&gt;
* Issue: [https://github.com/artefactual/archivematica/issues/1172 1172]&lt;br /&gt;
&lt;br /&gt;
===Configure email settings===&lt;br /&gt;
&lt;br /&gt;
This change improves the ways that the email client in Archivematica can be configured, including allowing an administrator to set the sender email address for emails sent by Archivematica (i.e. normalization reports, failure reports) to comply with local IT requirements.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica-docs/pull/208&lt;br /&gt;
* Documentation: [https://www.archivematica.org/docs/archivematica-1.8/admin-manual/installation-setup/customization/customization/#email-notification-configuration Email notification configuration]&lt;br /&gt;
&lt;br /&gt;
===Download processing configuration and reset to default===&lt;br /&gt;
&lt;br /&gt;
Previous versions of Archivematica introduced the ability to add custom processing configurations, but users had to retrieve the custom configuration file via the command line to use it. There is now a download button on Administration &amp;gt; Processing configuration so that you can download the processing config from the user interface.&lt;br /&gt;
&lt;br /&gt;
You can also reset a processing configuration to the installation pre-set by clicking on the new reset button on Administration &amp;gt; Processing configuration.&lt;br /&gt;
&lt;br /&gt;
The documentation for using a custom processing configuration has also been updated.&lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/1138, https://github.com/artefactual/archivematica/issues/800&lt;br /&gt;
* Documentation: [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/administer/dashboard-admin/#processing-configuration Processing configuration (user manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#processing-configuration Processing configuration (administrator manual)], [https://www.archivematica.org/en/docs/archivematica-1.8/admin-manual/installation-setup/customization/dashboard-config/#using-a-custom-processing-configuration-file Using a custom processing configuration file]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===MCP Batching for scalability &amp;amp; performance===&lt;br /&gt;
&lt;br /&gt;
This feature refactors how tasks are scheduled, executed &amp;amp; managed within Archivematica, by grouping tasks into batches. It introduces processing efficiencies that significantly decrease the processing power and time required to complete Transfer and Ingest. It includes new configuration options to further optimize processing efficiency for particular types of Transfers (e.g. few large files vs. many small files) and for different deployment patterns (e.g. installing components across multiple machines). &lt;br /&gt;
&lt;br /&gt;
This feature does not impact the functionality or appearance of Archivematica.&lt;br /&gt;
&lt;br /&gt;
This work was sponsored by Jisc. Thank you!  &lt;br /&gt;
&lt;br /&gt;
* Issue: https://github.com/artefactual/archivematica/issues/938&lt;br /&gt;
* Documentation: Scaling Architematica [update with link when PR 182 is merged];&lt;br /&gt;
&lt;br /&gt;
==Fixed==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/16 ASCII codes can't decode when the filename contains a backtick]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/42 AIP re-ingest fails]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/43 PREMIS events from previous transfers are re-appearing]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1132 Metadata reingest fails when dc:type is null]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/46 Use 7-zip without compression (Copy) mode]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/140 Metadata added before &amp;quot;Approve Transfer&amp;quot; disappears]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/173 Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label]&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1104 Zip files with diacritic characters are failing to extract]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/123 DSpace REST login error in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/124 Unable to edit DSpace REST Space settings in SS]&lt;br /&gt;
* [https://github.com/archivematica/Issues/issues/220 restructureBagForComplianceFileUUIDsAssigned needs to create intermediate directories for Zipped bag transfers] - '''Community contribution''' by Hillel Arnold. Thank you!&lt;br /&gt;
* [https://github.com/artefactual/archivematica/issues/1050 Ingest fails if Archivematica isn't connected to the Internet]&lt;br /&gt;
&lt;br /&gt;
==Upgraded tools and dependencies==&lt;br /&gt;
&lt;br /&gt;
* Fido has been upgraded to version 1.3.12&lt;br /&gt;
* Siegfried has been upgraded to version 1.7.10&lt;br /&gt;
* FITS has been upgraded to version 1.1.0&lt;br /&gt;
&lt;br /&gt;
==End of life dependencies==&lt;br /&gt;
&lt;br /&gt;
===Archivists' Toolkit integration===&lt;br /&gt;
&lt;br /&gt;
Archivists' Toolkit has been deprecated since 2013. The Archivists' Toolkit DIP upload feature has not had active development or testing since then. There are no plans to start testing or to fix any problems with the feature. As a result, there is a [https://github.com/archivematica/Issues/issues/174 proposal deprecate this feature in Archivematica 1.9]. Community response is welcome via a comment on the issue in GitHub.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12653</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12653"/>
		<updated>2018-09-12T16:14:11Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview of Dataverse to Archivematica Integration ==&lt;br /&gt;
&lt;br /&gt;
=== Feature Files ===&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''1) User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
'''2) Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''3) Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
Archivematica creates a metadata file called agents.json that includes the agent information configured in the storage service. This information is used to populate the PREMIS agent details in the METS files. See [[Dataverse#agents.json]] for more details. &lt;br /&gt;
&lt;br /&gt;
When a dataset includes a &amp;quot;bundle&amp;quot; of related files for tabular data, it is provided as a .zip file. Archivematica extracts all of the files in bundles at this stage. Other .zip files are not affected, and can be extracted or not using the standard processing configuration options. See TO DO - ADD LINK TO dataset section&lt;br /&gt;
&lt;br /&gt;
'''4) Transfer &amp;amp; Ingest'''&lt;br /&gt;
&lt;br /&gt;
Archivematica performs transfer and ingest processes using the standard processing configuration options. Additional processing for Dataverse datasets include&lt;br /&gt;
* creating a Dataverse METS that describes the dataset as provided by Dataverse&lt;br /&gt;
* fixity check of files using checksums provided by Dataverse&lt;br /&gt;
* including Dataverse metadata (from the Dataverse METS) in the final AIP METS &lt;br /&gt;
&lt;br /&gt;
'''5) Store the AIP'''&lt;br /&gt;
&lt;br /&gt;
The AIP is stored in whatever location has been configured. Scholar's Portal intend to store their AIPs in an S3 location (which is a standard configuration option as of Storage Service version 0.12). &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Dataverse Datasets ==&lt;br /&gt;
&lt;br /&gt;
=== Dataset Metadata file - dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each (original) data file. (It does not currently provide checksums for derivatives or metadata files created by dataverse)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Agents Metadata file - agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
=== Bundles for tabular data files ===&lt;br /&gt;
&lt;br /&gt;
When Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/index.html ingests some forms of tabular data], it creates derivatives of the original data file and additional metadata files. All of these files are provided in a [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] as a zipped package, containing: &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Different derivative (alternative) formats of the original file (e.g. tab-delimited file, R data file)&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format); &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''TO DO''' - update notes on how bundles are retrieved. the original version of this documentation included these notes which need to be updated / corrected: &lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12652</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12652"/>
		<updated>2018-09-12T15:54:50Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Dataverse Workflow */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview of Dataverse to Archivematica Integration ==&lt;br /&gt;
&lt;br /&gt;
=== Feature Files ===&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''1) User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
'''2) Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''3) Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
Archivematica creates a metadata file called agents.json that includes the agent information configured in the storage service. This information is used to populate the PREMIS agent details in the METS files. See [[Dataverse#agents.json]] for more details. &lt;br /&gt;
&lt;br /&gt;
When a dataset includes a &amp;quot;bundle&amp;quot; of related files for tabular data, it is provided as a .zip file. Archivematica extracts all of the files in bundles at this stage. Other .zip files are not affected, and can be extracted or not using the standard processing configuration options. See TO DO - ADD LINK TO dataset section&lt;br /&gt;
&lt;br /&gt;
'''4) Transfer &amp;amp; Ingest'''&lt;br /&gt;
&lt;br /&gt;
Archivematica performs transfer and ingest processes using the standard processing configuration options. Additional processing for Dataverse datasets include&lt;br /&gt;
* creating a Dataverse METS that describes the dataset as provided by Dataverse&lt;br /&gt;
* fixity check of files using checksums provided by Dataverse&lt;br /&gt;
* including Dataverse metadata (from the Dataverse METS) in the final AIP METS &lt;br /&gt;
&lt;br /&gt;
'''5) Store the AIP'''&lt;br /&gt;
&lt;br /&gt;
The AIP is stored in whatever location has been configured. Scholar's Portal intend to store their AIPs in an S3 location (which is a standard configuration option as of Storage Service version 0.12). &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''' move all of this to Dataverse dataset section '''&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12651</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12651"/>
		<updated>2018-09-12T15:21:55Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview of Dataverse to Archivematica Integration ==&lt;br /&gt;
&lt;br /&gt;
=== Feature Files ===&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12650</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12650"/>
		<updated>2018-09-12T15:20:57Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview of Dataverse to Archivematica Integration ==&lt;br /&gt;
&lt;br /&gt;
=== Feature Files ===&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12649</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12649"/>
		<updated>2018-09-12T15:20:11Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Feature Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview of Dataverse to Archivematica Integration ==&lt;br /&gt;
&lt;br /&gt;
=== Feature Files ===&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12648</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12648"/>
		<updated>2018-09-12T15:15:50Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* AIP structure */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [see Dataverse#agents.json] &lt;br /&gt;
│       │           ├── dataset.json [see Dataverse#dataverse.json] &lt;br /&gt;
│       │           └── METS.xml [see Dataverse#Dataverse_METS_file]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [the standard Transfer METS file described above]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12647</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12647"/>
		<updated>2018-09-12T15:10:34Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
=== dataset.json ===&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
=== agents.json ===&lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12646</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12646"/>
		<updated>2018-09-12T15:09:31Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dataset Metadata files == &lt;br /&gt;
&lt;br /&gt;
'''dataset.json'''&lt;br /&gt;
This file is provided by Dataverse. It lists all files provided in the dataset, and provides checksums for all original files (it does not currently provide checksums for derivatives or metadata files created by dataverse). &lt;br /&gt;
&lt;br /&gt;
'''agents.json''' &lt;br /&gt;
This file is created by Archivematica. It includes the Agent information that is entered into the Storage Service when configuring a Dataverse Location. To do: add link to final docs once they are updated. &lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12645</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12645"/>
		<updated>2018-09-12T14:58:03Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
The Dataverse METS is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/metadata/transfers/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
(This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica). &lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
During transfer processing, a Transfer METS file is created. This is found in the final AIP in this location: &amp;lt;AIP Name&amp;gt;/data/objects/submissionDocumentation/&amp;lt;transfer name&amp;gt;/METS.xml&lt;br /&gt;
&lt;br /&gt;
This is an existing (standard) process that hasn't been changed in this project.&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12644</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12644"/>
		<updated>2018-09-12T14:47:10Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* AIP METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the Dataverse METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event was carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12643</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12643"/>
		<updated>2018-09-12T14:42:47Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Sample Dataverse METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting Dataverse METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12642</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12642"/>
		<updated>2018-09-12T14:41:38Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Transfer METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Dataverse METS file ==&lt;br /&gt;
&lt;br /&gt;
Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes: &lt;br /&gt;
* descriptive metadata about the dataset, mapped to the [https://www.ddialliance.org/Specification/DDI-Codebook/2.5/ DDI standard]&lt;br /&gt;
* a &amp;lt;mets:fileSec&amp;gt; section that lists all files provided, grouped by type (original, metadata or derivative)&lt;br /&gt;
* a &amp;lt;mets:structMap&amp;gt; section that describes the structure of the files as provided by Dataverse (particularly helpful for understanding which files were provided in 'bundles')&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12641</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12641"/>
		<updated>2018-09-12T14:30:29Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Sample transfer METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample Dataverse METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
The table below shows how elements in the METS files are populated from metadata or files provided with Dataverse Datasets. &lt;br /&gt;
&lt;br /&gt;
More metadata from dataverse could be mapped into the METS files. Scholar's Portal would like to see more metadata in the AIP to enable better indexing &amp;amp; search / discovery of datasets. To show which fields could be used, we took a version of the Dataverse metadata crosswalk, and created our own version that includes Archivematica. The [https://docs.google.com/spreadsheets/d/18Xn4yR-nvbZV5lfrxVNQ8GHM18ilZ_IPocP9UeOtCY4/edit?usp=sharing Dataverse 4.0+ to Archivematica Metadata Crosswalk] provides the same details in the table below but also highlights additional fields that should ultimately be mapped into METS.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12628</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12628"/>
		<updated>2018-09-06T20:28:40Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
This [https://drive.google.com/open?id=1XlHZF2Sryg_79qzw7G-R4PeWmMcPgRug screencast] provides a demonstration of the current implementation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Dataverse Workflow===&lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse_Workflow_overview.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[1] '''User Selects Dataset''' &lt;br /&gt;
When the Storage Service is configured to connect to Dataverse, the Transfer Browser in the Dashboard will display a list of all Dataverse Transfer Source Locations. Transfer Source locations can be configured to filter on search terms, or on a particular dataverse. See (TODO - add link to SS documentation). Users can browse through the datasets available, select one and set the Transfer type to Dataverse. &lt;br /&gt;
&lt;br /&gt;
[2] '''Storage Service Retrieves Dataset'''&lt;br /&gt;
The storage services uses the Dataverse API to retrieve the selected dataset. API credentials are stored in the Storage Service Space. &lt;br /&gt;
&lt;br /&gt;
'''[3] Prepare Transfer''' &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=File:Dataverse_Workflow_overview.png&amp;diff=12627</id>
		<title>File:Dataverse Workflow overview.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=File:Dataverse_Workflow_overview.png&amp;diff=12627"/>
		<updated>2018-09-06T20:15:59Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: High level summary of Dataverse datasets are preserved using Archivematica&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;High level summary of Dataverse datasets are preserved using Archivematica&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12626</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12626"/>
		<updated>2018-09-06T20:12:21Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (v1.8).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''September 6, 2018'''&lt;br /&gt;
Development work is almost complete. QA is in progress. Changes are scheduled to be included in version 1.8 of Archviematica. To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. The draft versions &amp;amp; comments are documented in this [https://docs.google.com/document/d/1KqhpTuiSY2_B5oAM1cgXHAA72hmiUa8SBh4laylTkGo/edit feature file]. &lt;br /&gt;
&lt;br /&gt;
'''Feature: Preserve a Dataverse dataset''' &lt;br /&gt;
 &lt;br /&gt;
  Alma is an Archivematica user &lt;br /&gt;
  And they want to preserve a dataset published in a Dataverse&lt;br /&gt;
    ''Definitions''  &lt;br /&gt;
    Dataverse Dataset: A dataset that has been published in a Dataverse, including all &lt;br /&gt;
    original files uploaded to dataverse, and any derivative files created by Dataverse.  &lt;br /&gt;
    Dataverse METS: A metadata file using the METS standard that describes a dataset; &lt;br /&gt;
    including descriptive metadata, list of all objects in the dataset, their structure &lt;br /&gt;
    and relationships to each other. &lt;br /&gt;
  ''Scenario: Manual Selection of Dataset''&lt;br /&gt;
    Given the Storage Service is configured to connect to a Dataverse Repository &lt;br /&gt;
      And the dataset has been published in Dataverse &lt;br /&gt;
  When the user selects the transfer type “Dataverse” &lt;br /&gt;
    And the user selects the dataset to be preserved  &lt;br /&gt;
    And the user enters the &amp;lt;Transfer Name&amp;gt;&lt;br /&gt;
    And the user enters the (optional) &amp;lt;Accession number&amp;gt; &lt;br /&gt;
    And the users clicks the “Start Transfer” Button&lt;br /&gt;
  Then Archivematica copies the files from Dataverse to a local processing directory   &lt;br /&gt;
    And the Approve Transfer microservice asks the user to approve the transfer&lt;br /&gt;
    And the user selects yes &lt;br /&gt;
    And the Verify Transfer Compliance microservice creates the Dataverse METS&lt;br /&gt;
    And the Dataverse metadata files are generated and included in a metadata directory &lt;br /&gt;
    And the Verify Transfer Compliance microservice confirms this is a valid Dataverse Transfer&lt;br /&gt;
    And the Verify Transfer Checksums microservice confirms the checksums provided by dataverse match those generated for each file in the dataset&lt;br /&gt;
    And the AIP Mets File includes the Dataverse generated events&lt;br /&gt;
    And the completed AIP is stored in the specified Dataverse storage location&lt;br /&gt;
 &lt;br /&gt;
===Workflow diagram===&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse - Archivematica workflow_1.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram notes===&lt;br /&gt;
&lt;br /&gt;
[1] &amp;quot;Ingest script&amp;quot; refers to an [https://github.com/artefactual/automation-tools automation tool] designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.&lt;br /&gt;
&lt;br /&gt;
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.&lt;br /&gt;
&lt;br /&gt;
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12625</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12625"/>
		<updated>2018-08-30T14:20:45Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Sample transfer METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (exact release tbc).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
==Overview==&lt;br /&gt;
This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''May 11, 2018'''&lt;br /&gt;
To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. &lt;br /&gt;
&lt;br /&gt;
The early drafts are documented in this google doc: [http://docs.behat.org/en/v2.5/guides/1.gherkin.html]&lt;br /&gt;
Once the draft has been reviewed we will publish it to our acceptance test repository in github. &lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
April 24, 2017&lt;br /&gt;
This feature requires a development branch of Archivematica, which can be installed with the following steps:&lt;br /&gt;
&lt;br /&gt;
1) install deploy-pub. https://github.com/artefactual/deploy-pub&lt;br /&gt;
2) use the archivematica-centos7 playbook in deploy-pub https://github.com/artefactual/deploy-pub/tree/master/playbooks/archivematica-centos7&lt;br /&gt;
3) create a hosts file that lists your target machine (see digital ocean example linked from playbook)&lt;br /&gt;
4) in requirements.yml change version of ansible-archivematica-src to &amp;quot;stable/1.6.x&amp;quot;&lt;br /&gt;
5) change singlenode.yml to point to the host you defined in your hosts file.&lt;br /&gt;
6) change the vars-singlenode.yml to include the following info:&lt;br /&gt;
#required for dataverse testing&lt;br /&gt;
archivematica_src_am_version: &amp;quot;dev/dataverse-poc&amp;quot;&lt;br /&gt;
archivematica_src_automationtools: &amp;quot;yes&amp;quot;&lt;br /&gt;
archivematica_src_automationtools_version: &amp;quot;dev/dataverse&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Workflow==&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
*The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, preparing Archivematica Archival Information Packages (AIPs) and placing them in archival storage, &amp;lt;strike&amp;gt; and updating the Dataverse study with the AIP UUIDs &amp;lt;/strike&amp;gt; (this was determined to be out of scope). &lt;br /&gt;
*Analysis is based on Dataverse tests using [https://apitest.dataverse.org https://apitest.dataverse.org] and [https://demo.dataverse.org https://demo.dataverse.org], online documentation at http://guides.dataverse.org/en/latest/api/index.html and discussions with Dataverse developers and users. &lt;br /&gt;
*Proposed integration is for Archivematica 1.5 and higher and Dataverse 4.x.&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram===&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse - Archivematica workflow_1.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram notes===&lt;br /&gt;
&lt;br /&gt;
[1] &amp;quot;Ingest script&amp;quot; refers to an [https://github.com/artefactual/automation-tools automation tool] designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.&lt;br /&gt;
&lt;br /&gt;
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.&lt;br /&gt;
&lt;br /&gt;
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|json: &amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12624</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12624"/>
		<updated>2018-08-30T14:20:16Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Sample transfer METS file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (exact release tbc).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
==Overview==&lt;br /&gt;
This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''May 11, 2018'''&lt;br /&gt;
To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. &lt;br /&gt;
&lt;br /&gt;
The early drafts are documented in this google doc: [http://docs.behat.org/en/v2.5/guides/1.gherkin.html]&lt;br /&gt;
Once the draft has been reviewed we will publish it to our acceptance test repository in github. &lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
April 24, 2017&lt;br /&gt;
This feature requires a development branch of Archivematica, which can be installed with the following steps:&lt;br /&gt;
&lt;br /&gt;
1) install deploy-pub. https://github.com/artefactual/deploy-pub&lt;br /&gt;
2) use the archivematica-centos7 playbook in deploy-pub https://github.com/artefactual/deploy-pub/tree/master/playbooks/archivematica-centos7&lt;br /&gt;
3) create a hosts file that lists your target machine (see digital ocean example linked from playbook)&lt;br /&gt;
4) in requirements.yml change version of ansible-archivematica-src to &amp;quot;stable/1.6.x&amp;quot;&lt;br /&gt;
5) change singlenode.yml to point to the host you defined in your hosts file.&lt;br /&gt;
6) change the vars-singlenode.yml to include the following info:&lt;br /&gt;
#required for dataverse testing&lt;br /&gt;
archivematica_src_am_version: &amp;quot;dev/dataverse-poc&amp;quot;&lt;br /&gt;
archivematica_src_automationtools: &amp;quot;yes&amp;quot;&lt;br /&gt;
archivematica_src_automationtools_version: &amp;quot;dev/dataverse&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Workflow==&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
*The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, preparing Archivematica Archival Information Packages (AIPs) and placing them in archival storage, &amp;lt;strike&amp;gt; and updating the Dataverse study with the AIP UUIDs &amp;lt;/strike&amp;gt; (this was determined to be out of scope). &lt;br /&gt;
*Analysis is based on Dataverse tests using [https://apitest.dataverse.org https://apitest.dataverse.org] and [https://demo.dataverse.org https://demo.dataverse.org], online documentation at http://guides.dataverse.org/en/latest/api/index.html and discussions with Dataverse developers and users. &lt;br /&gt;
*Proposed integration is for Archivematica 1.5 and higher and Dataverse 4.x.&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram===&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse - Archivematica workflow_1.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram notes===&lt;br /&gt;
&lt;br /&gt;
[1] &amp;quot;Ingest script&amp;quot; refers to an [https://github.com/artefactual/automation-tools automation tool] designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.&lt;br /&gt;
&lt;br /&gt;
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.&lt;br /&gt;
&lt;br /&gt;
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
A [http://guides.dataverse.org/en/latest/user/dataset-management.html?highlight=bundle bundle] is a zipped object, documented by Dataverse as containing all of the below files: &lt;br /&gt;
&lt;br /&gt;
* As tab-delimited data (with the variable names in the first row);&lt;br /&gt;
* The original file uploaded by the user;&lt;br /&gt;
* Saved as R data (if the original file was not in R format);&lt;br /&gt;
* Variable Metadata (as a DDI Codebook XML file);&lt;br /&gt;
* Data File Citation (currently in either RIS or EndNote XML format);&lt;br /&gt;
&lt;br /&gt;
Supported tabular formats are listed in the Dataverse [http://guides.dataverse.org/en/latest/user/tabulardataingest/supportedformats.html manual]&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|&amp;quot;publisher&amp;quot;: &amp;quot;Root Dataverse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12590</id>
		<title>Storage Service API</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12590"/>
		<updated>2018-07-31T21:58:52Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Create space */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Development]] &amp;gt; Storage Service API&lt;br /&gt;
&lt;br /&gt;
The [[Storage Service]] API provides programmatic access to moving files around in storage areas that the Storage Service has access to.&lt;br /&gt;
&lt;br /&gt;
The API is written using [http://django-tastypie.readthedocs.io/en/latest/ TastyPie].&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: TastyPie is less well supported than [http://www.django-rest-framework.org/ Django REST Framework], both in terms of docs &amp;amp; community. We should look at replacing TastyPie with DRF.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Endpoints require authentication with a username and API key.  This can be submitted as GET parameters (eg &amp;lt;code&amp;gt;?username=test&amp;amp;api_key=e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;) or as a header (eg &amp;lt;code&amp;gt;Authorization: ApiKey test:e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
== A note about browsing ==&lt;br /&gt;
&lt;br /&gt;
A detailed schema can be found for each of the resources by adding &amp;quot;schema&amp;quot; to the get all URL.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/location/schema/?format=json&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;allowed_detail_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;,&lt;br /&gt;
        &amp;quot;post&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;allowed_list_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;default_format&amp;quot;: &amp;quot;application/json&amp;quot;,&lt;br /&gt;
    &amp;quot;default_limit&amp;quot;: 20,&lt;br /&gt;
    &amp;quot;fields&amp;quot;: {&lt;br /&gt;
        &amp;quot;description&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;description&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;enabled&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: true,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;True if space can be accessed.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;boolean&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Enabled&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Many related resources. Can be either a list of URIs or list of individually nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/pipeline/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_many&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;pipeline&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Purpose of the space.  Eg. AIP storage, Transfer source&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Purpose&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;quota&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: null,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Size, in bytes (optional)&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: true,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Quota&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Path to location, relative to the storage space's path.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Relative Path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;resource_uri&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;resource uri&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;space&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;A single related resource. Can be either a URI or set of nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/space/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_one&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;space&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;used&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: 0,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Amount used, in bytes.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Used&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unique identifier&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: true,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;uuid&amp;quot;&lt;br /&gt;
        }&lt;br /&gt;
    },&lt;br /&gt;
    &amp;quot;filtering&amp;quot;: {&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;quota&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;space&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;used&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: 1&lt;br /&gt;
    }&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
This schema, among other things, describes the fields in the resource (including the schema URI of related resource fields) and the fields that allow filtering. Valid filtering values are: Django ORM filters (e.g. startswith, exact, lte, etc.) or 1 or 2. If a filtering field is set to 2 it can be filtered over the related resource fields. For example, the locations could be filtered by their pipeline UUID setting it in a request parameter formatted with two underscore chars: &amp;lt;code&amp;gt;/api/v2/location/?pipeline__uuid=&amp;lt;uuid&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more info on how to interact with the API see:&lt;br /&gt;
&lt;br /&gt;
http://django-tastypie.readthedocs.io/en/v0.13.1/interacting.html&lt;br /&gt;
&lt;br /&gt;
== Pipeline ==&lt;br /&gt;
&lt;br /&gt;
=== Get all pipelines ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Description of the pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of pipelines. See [[#Get pipeline details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the pipelines in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by the description or uuid. Disabled pipelines are not returned.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/pipeline/?description__startswith=Archivematica' | python -m json.tool&lt;br /&gt;
 {&lt;br /&gt;
     &amp;quot;meta&amp;quot;: {&lt;br /&gt;
         &amp;quot;limit&amp;quot;: 20,&lt;br /&gt;
         &amp;quot;next&amp;quot;: null,&lt;br /&gt;
         &amp;quot;offset&amp;quot;: 0,&lt;br /&gt;
         &amp;quot;previous&amp;quot;: null,&lt;br /&gt;
         &amp;quot;total_count&amp;quot;: 1&lt;br /&gt;
     },&lt;br /&gt;
     &amp;quot;objects&amp;quot;: [&lt;br /&gt;
         {&lt;br /&gt;
             &amp;quot;description&amp;quot;: &amp;quot;Archivematica on alouette&amp;quot;,&lt;br /&gt;
             &amp;quot;remote_name&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;br /&gt;
             &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/dd354557-9e6e-4918-9fe3-a65b00ecb1af/&amp;quot;,&lt;br /&gt;
             &amp;quot;uuid&amp;quot;: &amp;quot;dd354557-9e6e-4918-9fe3-a65b00ecb1af&amp;quot;&lt;br /&gt;
         }&lt;br /&gt;
     ]&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Create new pipeline ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new pipeline: &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_key&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_username&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt;: If True, will associated default [[Storage Service#Locations | Locations]] with the newly created pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;shared_path&amp;lt;/code&amp;gt;: If default locations are created, create the [[Storage Service#Currently Processing | processing]] location at this path in the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
*** Before v0.11.0: If &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt; is set, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
*** In v0.11.0 or newer: If not provided, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
* '''Response''': JSON with data for the pipeline&lt;br /&gt;
&lt;br /&gt;
If the 'Pipelines disabled on creation' setting is set, the pipeline will be disabled by default, and will not respond to queries.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X POST -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; -H&amp;quot;Content-Type: application/json&amp;quot; -d'{&amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;, &amp;quot;create_default_locations&amp;quot;: true, &amp;quot;api_username&amp;quot;: &amp;quot;demo&amp;quot;, &amp;quot;api_key&amp;quot;: &amp;quot;03ecb307f5b8012f4771d245d534830378a87259&amp;quot;}' 'http://192.168.1.42:8000/api/v2/pipeline/'&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;create_default_locations&amp;quot;: true,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;,&lt;br /&gt;
    &amp;quot;remote_name&amp;quot;: &amp;quot;192.168.1.42&amp;quot;,&lt;br /&gt;
    &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/99354557-9e6e-4918-9fe3-a65b00ecb199/&amp;quot;,&lt;br /&gt;
    &amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Get pipeline details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Pipeline description&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: IP or hostname of the pipeline. For use in API calls&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI for this pipeline in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
&lt;br /&gt;
== Space ==&lt;br /&gt;
&lt;br /&gt;
=== Get all spaces ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Protocol that the [[Storage Service#Space | Space]] uses. Must be searched based on the database code.&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size in bytes. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the Space&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of spaces. See [[#Get space details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the spaces in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by several fields: access protocol, path, size, amount used, UUID and verified status. Disabled spaces are not returned.&lt;br /&gt;
&lt;br /&gt;
=== Get space details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Database code for the access protocol&lt;br /&gt;
** &amp;lt;code&amp;gt;last_verified&amp;lt;/code&amp;gt;: Date of last verification. This is a stub feature&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI to the resource in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size of the space in bytes.&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. &lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the space&lt;br /&gt;
** &amp;lt;code&amp;gt;verified&amp;lt;/code&amp;gt;: If the space is verified. This is a stub feature&lt;br /&gt;
** Other space-specific fields&lt;br /&gt;
&lt;br /&gt;
=== Browse space path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Space to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries at path, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in path. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
=== Create space ===&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/archivematica/Issues/issues/36 Issue 36].&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/space&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new space: See the [https://www.archivematica.org/en/docs/storage-service-0.11/administrators/#id2 Storage Service Documentation] or [https://wiki.archivematica.org/Storage_Service#Space Space] for fields relevant to each type of space. Basic fields for a local file system space are listed below. &lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: this defines the type of space&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Absolute path to the Space on the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;:  (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.&lt;br /&gt;
&lt;br /&gt;
Example (to create an S3 space):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ curl \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -d @payload.json \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
        http://127.0.0.1:62081/api/v2/space/&lt;br /&gt;
&lt;br /&gt;
Where payload.json contains&lt;br /&gt;
{&lt;br /&gt;
    &amp;quot;access_protocol&amp;quot;: &amp;quot;S3&amp;quot;,&lt;br /&gt;
    &amp;quot;path&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
    &amp;quot;staging_path&amp;quot;: &amp;quot;/&amp;quot;,&lt;br /&gt;
    &amp;quot;endpoint_url&amp;quot;: &amp;quot;http://127.0.0.1:12345&amp;quot;,&lt;br /&gt;
    &amp;quot;access_key_id&amp;quot;: &amp;quot;_Cah4cae1_&amp;quot;,&lt;br /&gt;
    &amp;quot;secret_access_key&amp;quot;: &amp;quot;_Thu6Ahqu_&amp;quot;,&lt;br /&gt;
    &amp;quot;region&amp;quot;: &amp;quot;us-west-2&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Location ==&lt;br /&gt;
&lt;br /&gt;
=== Get all locations ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new location ===&lt;br /&gt;
&lt;br /&gt;
Added in v0.12 - see [https://github.com/artefactual/archivematica-storage-service/issues/367 issue 367] and [https://github.com/archivematica/Issues/issues/37 issue 37].&lt;br /&gt;
&lt;br /&gt;
This endpoint creates a location in the storage service, but it doesn't actually create the directory that the location points to.  &lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;.&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;space&amp;lt;/code&amp;gt;: URI of the space.&lt;br /&gt;
** &amp;lt;code&amp;gt;default&amp;lt;/code&amp;gt;: If 'true' this location will be the default for it's purpose. &lt;br /&gt;
** &amp;lt;code&amp;gt;purpose&amp;lt;/code&amp;gt;: (below is a list of possible values)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AR&amp;lt;/code&amp;gt; (AIP_RECOVERY)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AS&amp;lt;/code&amp;gt; (AIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;CP&amp;lt;/code&amp;gt; (CURRENTLY_PROCESSING)&lt;br /&gt;
*** &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (DIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SD&amp;lt;/code&amp;gt; (SWORD_DEPOSIT)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (STORAGE_SERVICE_INTERNAL)&lt;br /&gt;
*** &amp;lt;code&amp;gt;BL&amp;lt;/code&amp;gt; (BACKLOG)&lt;br /&gt;
*** &amp;lt;code&amp;gt;TS&amp;lt;/code&amp;gt; (TRANSFER_SOURCE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;RP&amp;lt;/code&amp;gt; (REPLICATOR)&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative to the space's path.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl -s -d '{&lt;br /&gt;
    &amp;quot;pipeline&amp;quot;: [&amp;quot;/api/v2/pipeline/90707555-244f-47af-8271-66496a6a965b/&amp;quot;],&lt;br /&gt;
    &amp;quot;purpose&amp;quot;: &amp;quot;TS&amp;quot;,&lt;br /&gt;
    &amp;quot;relative_path&amp;quot;: &amp;quot;foo/bar&amp;quot;,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;foobar&amp;quot;,&lt;br /&gt;
    &amp;quot;space&amp;quot;: &amp;quot;/api/v2/space/141593ff-2a27-44a1-9de1-917573fa0f4a/&amp;quot;&lt;br /&gt;
}' \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
        &amp;quot;http://127.0.0.1:62081/api/v2/location/&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get location details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Move files to this location ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location the files should be moved from&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the [[Storage Service#Pipeline | pipeline]]. Both Locations must be associated with this pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dicts containing &amp;lt;code&amp;gt;source&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;destination&amp;lt;/code&amp;gt;.  The source and destination are paths relative to their Location of the files to be moved.&lt;br /&gt;
&lt;br /&gt;
Intended for use with creating Transfers, SIPs, etc and other cases where files need to be moved but not tracked by the storage service.&lt;br /&gt;
&lt;br /&gt;
=== Browse location path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Location to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries in `path`, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in `path`. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== SWORD collection ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/sword/collection/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Package ==&lt;br /&gt;
&lt;br /&gt;
=== Get all packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON. Fields for a new package:&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the new package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;DIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;transfer&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposit&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from&lt;br /&gt;
** &amp;lt;code&amp;gt;related_package_uuid&amp;lt;/code&amp;gt;: UUID of a package that is related to this one. E.g. UUID of a DIP when storing an AIP&lt;br /&gt;
&lt;br /&gt;
Creates a database entry tracking the package (AIP, transfer, etc).  If the package is an AIP, DIP or AIC and the current_location is an AIP or DIP storage location it also moves the files from the source to destination location.  If the package is a Transfer and the current_location is transfer backlog, it is also moved.&lt;br /&gt;
&lt;br /&gt;
This is handled through the modified &amp;lt;code&amp;gt;obj_create&amp;lt;/code&amp;gt; function, which calls &amp;lt;code&amp;gt;Package.store_aip&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Package.backlog_transfer&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get package details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Update package contents ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Flag to mark that this is reingest. Reduces chance to accidentally modify an AIP.&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the existing package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from.  This must be the same pipeline reingest was started on (tracked through &amp;lt;code&amp;gt;Package.misc_attributes.reingest_pipeline&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Updates the contents of a package during reingest.  If the package is an AIP or AIC, currently stored in an AIP storage location, and the 'reingest' parameter is set, it will call &amp;lt;code&amp;gt;Package.finish_reingest&amp;lt;/code&amp;gt; and merge the new AIP with the existing one.&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Update package metadata ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PATCH&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Pipeline UUID or None.&lt;br /&gt;
&lt;br /&gt;
Used to update metadata stored in the database for the package.  Currently, this is used to update the reingest status.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Currently, this always sets Package.misc_attributes.reingest to None, regardless of what value was actually passed in.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;update_in_place&amp;lt;/code&amp;gt; also helps.&lt;br /&gt;
&lt;br /&gt;
=== Delete package request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/delete_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for deleting the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline the delete request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the deletion. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the deletion.&lt;br /&gt;
&lt;br /&gt;
=== Recover AIP request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/recover_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for recovering the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the recovery request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the recovery. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the recovery.&lt;br /&gt;
&lt;br /&gt;
=== Download single file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/extract_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path_to_file&amp;lt;/code&amp;gt;: Path to the file to download, relative to the package path.&lt;br /&gt;
* '''Response''': Stream of the requested file&lt;br /&gt;
&lt;br /&gt;
Returns a single file from the Package.  If the package is compressed, it downloads the whole AIP and extracts it.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.  Currently, the storage service has no way to check if a file exists except by downloading and extracting this AIP&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Download package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;chunk number&amp;gt;/&amp;lt;/code&amp;gt; (for [[Storage Service#LOCKSS-o-matic | LOCKSS]] harvesting)&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the package&lt;br /&gt;
&lt;br /&gt;
Returns the entire package as a single file.  If the AIP is uncompressed, create one file by using `tar`.&lt;br /&gt;
&lt;br /&gt;
If the download URL has a chunk number, it will attempt to serve the LOCKSS chunk specified for that package. If the package is not in LOCKSS, it will return the the whole package.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Get pointer file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/pointer_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the pointer file.&lt;br /&gt;
&lt;br /&gt;
=== Check fixity ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/check_fixity/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;force_local&amp;lt;/code&amp;gt;: If true, download and run fixity on the AIP locally, instead of using the Space-provided fixity if available.&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True if the verification succeeded, False if the verification failed, None if the scan could not start&lt;br /&gt;
** &amp;lt;code&amp;gt;message&amp;lt;/code&amp;gt;: Human-readable string explaining the report; it will be empty for successful scans.&lt;br /&gt;
** &amp;lt;code&amp;gt;failures&amp;lt;/code&amp;gt;: List of 0 or more errors&lt;br /&gt;
** &amp;lt;code&amp;gt;timestamp&amp;lt;/code&amp;gt;: ISO-formated string with the datetime of the last fixity check. If the check was performed by an external system, this will be provided by that system. If not provided,or on error, it will be None.&lt;br /&gt;
&lt;br /&gt;
=== AIP storage callback request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/send_callback/post_store/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
Request to call any Callbacks configured to run post-storage for this AIP.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: This only works on locally available AIPs (AIPs stored in Spaces that are available via a UNIX filesystem layer).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True&lt;br /&gt;
** &amp;lt;code&amp;gt;package&amp;lt;/code&amp;gt;: UUID of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dictionaries with file information. Each dictionary has:&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_id&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
*** &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_package&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
*** &amp;lt;code&amp;gt;checksum&amp;lt;/code&amp;gt;: Checksum of the file, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Returns metadata about every file within the package.&lt;br /&gt;
&lt;br /&gt;
=== Update file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON list of dictionaries with information on the files to be added. Each dict must have the following attributes:&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Adds a set of files to a package.&lt;br /&gt;
&lt;br /&gt;
=== Delete file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': DELETE&lt;br /&gt;
&lt;br /&gt;
Removes all file records associated with this package.&lt;br /&gt;
&lt;br /&gt;
=== Query file information on packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/metadata/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
* '''Parameters''': Query string parameters.  Must have at least one, but not all are required&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
* '''Response''': JSON. List of dicts with file information about the files that match the query.&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;file_extension&amp;lt;/code&amp;gt;: File extension&lt;br /&gt;
** &amp;lt;code&amp;gt;filename&amp;lt;/code&amp;gt;: Name of the file, sans path.&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
=== Reingest AIP ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/reingest/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline to reingest on&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest_type&amp;lt;/code&amp;gt;: Type of reingest to start. One of &amp;lt;code&amp;gt;METADATA_ONLY&amp;lt;/code&amp;gt; (metadata-only reingest), &amp;lt;code&amp;gt;OBJECTS&amp;lt;/code&amp;gt; (partial reingest), &amp;lt;code&amp;gt;FULL&amp;lt;/code&amp;gt; (full reingest)&lt;br /&gt;
** &amp;lt;code&amp;gt;processing_config&amp;lt;/code&amp;gt;: Optional. Name of the processing configuration to use on full reingest&lt;br /&gt;
&lt;br /&gt;
=== SWORD endpoints ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/media/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/state/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Development documentation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12589</id>
		<title>Storage Service API</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12589"/>
		<updated>2018-07-31T21:57:52Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Space */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Development]] &amp;gt; Storage Service API&lt;br /&gt;
&lt;br /&gt;
The [[Storage Service]] API provides programmatic access to moving files around in storage areas that the Storage Service has access to.&lt;br /&gt;
&lt;br /&gt;
The API is written using [http://django-tastypie.readthedocs.io/en/latest/ TastyPie].&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: TastyPie is less well supported than [http://www.django-rest-framework.org/ Django REST Framework], both in terms of docs &amp;amp; community. We should look at replacing TastyPie with DRF.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Endpoints require authentication with a username and API key.  This can be submitted as GET parameters (eg &amp;lt;code&amp;gt;?username=test&amp;amp;api_key=e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;) or as a header (eg &amp;lt;code&amp;gt;Authorization: ApiKey test:e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
== A note about browsing ==&lt;br /&gt;
&lt;br /&gt;
A detailed schema can be found for each of the resources by adding &amp;quot;schema&amp;quot; to the get all URL.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/location/schema/?format=json&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;allowed_detail_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;,&lt;br /&gt;
        &amp;quot;post&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;allowed_list_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;default_format&amp;quot;: &amp;quot;application/json&amp;quot;,&lt;br /&gt;
    &amp;quot;default_limit&amp;quot;: 20,&lt;br /&gt;
    &amp;quot;fields&amp;quot;: {&lt;br /&gt;
        &amp;quot;description&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;description&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;enabled&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: true,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;True if space can be accessed.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;boolean&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Enabled&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Many related resources. Can be either a list of URIs or list of individually nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/pipeline/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_many&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;pipeline&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Purpose of the space.  Eg. AIP storage, Transfer source&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Purpose&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;quota&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: null,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Size, in bytes (optional)&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: true,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Quota&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Path to location, relative to the storage space's path.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Relative Path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;resource_uri&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;resource uri&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;space&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;A single related resource. Can be either a URI or set of nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/space/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_one&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;space&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;used&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: 0,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Amount used, in bytes.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Used&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unique identifier&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: true,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;uuid&amp;quot;&lt;br /&gt;
        }&lt;br /&gt;
    },&lt;br /&gt;
    &amp;quot;filtering&amp;quot;: {&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;quota&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;space&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;used&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: 1&lt;br /&gt;
    }&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
This schema, among other things, describes the fields in the resource (including the schema URI of related resource fields) and the fields that allow filtering. Valid filtering values are: Django ORM filters (e.g. startswith, exact, lte, etc.) or 1 or 2. If a filtering field is set to 2 it can be filtered over the related resource fields. For example, the locations could be filtered by their pipeline UUID setting it in a request parameter formatted with two underscore chars: &amp;lt;code&amp;gt;/api/v2/location/?pipeline__uuid=&amp;lt;uuid&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more info on how to interact with the API see:&lt;br /&gt;
&lt;br /&gt;
http://django-tastypie.readthedocs.io/en/v0.13.1/interacting.html&lt;br /&gt;
&lt;br /&gt;
== Pipeline ==&lt;br /&gt;
&lt;br /&gt;
=== Get all pipelines ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Description of the pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of pipelines. See [[#Get pipeline details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the pipelines in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by the description or uuid. Disabled pipelines are not returned.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/pipeline/?description__startswith=Archivematica' | python -m json.tool&lt;br /&gt;
 {&lt;br /&gt;
     &amp;quot;meta&amp;quot;: {&lt;br /&gt;
         &amp;quot;limit&amp;quot;: 20,&lt;br /&gt;
         &amp;quot;next&amp;quot;: null,&lt;br /&gt;
         &amp;quot;offset&amp;quot;: 0,&lt;br /&gt;
         &amp;quot;previous&amp;quot;: null,&lt;br /&gt;
         &amp;quot;total_count&amp;quot;: 1&lt;br /&gt;
     },&lt;br /&gt;
     &amp;quot;objects&amp;quot;: [&lt;br /&gt;
         {&lt;br /&gt;
             &amp;quot;description&amp;quot;: &amp;quot;Archivematica on alouette&amp;quot;,&lt;br /&gt;
             &amp;quot;remote_name&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;br /&gt;
             &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/dd354557-9e6e-4918-9fe3-a65b00ecb1af/&amp;quot;,&lt;br /&gt;
             &amp;quot;uuid&amp;quot;: &amp;quot;dd354557-9e6e-4918-9fe3-a65b00ecb1af&amp;quot;&lt;br /&gt;
         }&lt;br /&gt;
     ]&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Create new pipeline ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new pipeline: &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_key&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_username&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt;: If True, will associated default [[Storage Service#Locations | Locations]] with the newly created pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;shared_path&amp;lt;/code&amp;gt;: If default locations are created, create the [[Storage Service#Currently Processing | processing]] location at this path in the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
*** Before v0.11.0: If &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt; is set, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
*** In v0.11.0 or newer: If not provided, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
* '''Response''': JSON with data for the pipeline&lt;br /&gt;
&lt;br /&gt;
If the 'Pipelines disabled on creation' setting is set, the pipeline will be disabled by default, and will not respond to queries.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X POST -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; -H&amp;quot;Content-Type: application/json&amp;quot; -d'{&amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;, &amp;quot;create_default_locations&amp;quot;: true, &amp;quot;api_username&amp;quot;: &amp;quot;demo&amp;quot;, &amp;quot;api_key&amp;quot;: &amp;quot;03ecb307f5b8012f4771d245d534830378a87259&amp;quot;}' 'http://192.168.1.42:8000/api/v2/pipeline/'&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;create_default_locations&amp;quot;: true,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;,&lt;br /&gt;
    &amp;quot;remote_name&amp;quot;: &amp;quot;192.168.1.42&amp;quot;,&lt;br /&gt;
    &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/99354557-9e6e-4918-9fe3-a65b00ecb199/&amp;quot;,&lt;br /&gt;
    &amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Get pipeline details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Pipeline description&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: IP or hostname of the pipeline. For use in API calls&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI for this pipeline in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
&lt;br /&gt;
== Space ==&lt;br /&gt;
&lt;br /&gt;
=== Get all spaces ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Protocol that the [[Storage Service#Space | Space]] uses. Must be searched based on the database code.&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size in bytes. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the Space&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of spaces. See [[#Get space details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the spaces in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by several fields: access protocol, path, size, amount used, UUID and verified status. Disabled spaces are not returned.&lt;br /&gt;
&lt;br /&gt;
=== Get space details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Database code for the access protocol&lt;br /&gt;
** &amp;lt;code&amp;gt;last_verified&amp;lt;/code&amp;gt;: Date of last verification. This is a stub feature&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI to the resource in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size of the space in bytes.&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. &lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the space&lt;br /&gt;
** &amp;lt;code&amp;gt;verified&amp;lt;/code&amp;gt;: If the space is verified. This is a stub feature&lt;br /&gt;
** Other space-specific fields&lt;br /&gt;
&lt;br /&gt;
=== Browse space path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Space to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries at path, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in path. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
=== Create space ===&lt;br /&gt;
&lt;br /&gt;
See [https://github.com/archivematica/Issues/issues/36 Issue 36].&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/space&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new space: See the [https://www.archivematica.org/en/docs/storage-service-0.11/administrators/#id2 Storage Service Documentation] or [https://wiki.archivematica.org/Storage_Service#Space Space] for fields relevant to each type of space. Basic fields for a local file system space are listed below. &lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: this defines the type of space&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Absolute path to the Space on the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;:  (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.&lt;br /&gt;
&lt;br /&gt;
Example (to create an S3 space):&lt;br /&gt;
$ curl \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -d @payload.json \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
        http://127.0.0.1:62081/api/v2/space/&lt;br /&gt;
&lt;br /&gt;
Where payload.json contains&lt;br /&gt;
{&lt;br /&gt;
    &amp;quot;access_protocol&amp;quot;: &amp;quot;S3&amp;quot;,&lt;br /&gt;
    &amp;quot;path&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
    &amp;quot;staging_path&amp;quot;: &amp;quot;/&amp;quot;,&lt;br /&gt;
    &amp;quot;endpoint_url&amp;quot;: &amp;quot;http://127.0.0.1:12345&amp;quot;,&lt;br /&gt;
    &amp;quot;access_key_id&amp;quot;: &amp;quot;_Cah4cae1_&amp;quot;,&lt;br /&gt;
    &amp;quot;secret_access_key&amp;quot;: &amp;quot;_Thu6Ahqu_&amp;quot;,&lt;br /&gt;
    &amp;quot;region&amp;quot;: &amp;quot;us-west-2&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Location ==&lt;br /&gt;
&lt;br /&gt;
=== Get all locations ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new location ===&lt;br /&gt;
&lt;br /&gt;
Added in v0.12 - see [https://github.com/artefactual/archivematica-storage-service/issues/367 issue 367] and [https://github.com/archivematica/Issues/issues/37 issue 37].&lt;br /&gt;
&lt;br /&gt;
This endpoint creates a location in the storage service, but it doesn't actually create the directory that the location points to.  &lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;.&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;space&amp;lt;/code&amp;gt;: URI of the space.&lt;br /&gt;
** &amp;lt;code&amp;gt;default&amp;lt;/code&amp;gt;: If 'true' this location will be the default for it's purpose. &lt;br /&gt;
** &amp;lt;code&amp;gt;purpose&amp;lt;/code&amp;gt;: (below is a list of possible values)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AR&amp;lt;/code&amp;gt; (AIP_RECOVERY)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AS&amp;lt;/code&amp;gt; (AIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;CP&amp;lt;/code&amp;gt; (CURRENTLY_PROCESSING)&lt;br /&gt;
*** &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (DIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SD&amp;lt;/code&amp;gt; (SWORD_DEPOSIT)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (STORAGE_SERVICE_INTERNAL)&lt;br /&gt;
*** &amp;lt;code&amp;gt;BL&amp;lt;/code&amp;gt; (BACKLOG)&lt;br /&gt;
*** &amp;lt;code&amp;gt;TS&amp;lt;/code&amp;gt; (TRANSFER_SOURCE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;RP&amp;lt;/code&amp;gt; (REPLICATOR)&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative to the space's path.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl -s -d '{&lt;br /&gt;
    &amp;quot;pipeline&amp;quot;: [&amp;quot;/api/v2/pipeline/90707555-244f-47af-8271-66496a6a965b/&amp;quot;],&lt;br /&gt;
    &amp;quot;purpose&amp;quot;: &amp;quot;TS&amp;quot;,&lt;br /&gt;
    &amp;quot;relative_path&amp;quot;: &amp;quot;foo/bar&amp;quot;,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;foobar&amp;quot;,&lt;br /&gt;
    &amp;quot;space&amp;quot;: &amp;quot;/api/v2/space/141593ff-2a27-44a1-9de1-917573fa0f4a/&amp;quot;&lt;br /&gt;
}' \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
        &amp;quot;http://127.0.0.1:62081/api/v2/location/&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get location details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Move files to this location ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location the files should be moved from&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the [[Storage Service#Pipeline | pipeline]]. Both Locations must be associated with this pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dicts containing &amp;lt;code&amp;gt;source&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;destination&amp;lt;/code&amp;gt;.  The source and destination are paths relative to their Location of the files to be moved.&lt;br /&gt;
&lt;br /&gt;
Intended for use with creating Transfers, SIPs, etc and other cases where files need to be moved but not tracked by the storage service.&lt;br /&gt;
&lt;br /&gt;
=== Browse location path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Location to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries in `path`, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in `path`. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== SWORD collection ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/sword/collection/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Package ==&lt;br /&gt;
&lt;br /&gt;
=== Get all packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON. Fields for a new package:&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the new package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;DIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;transfer&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposit&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from&lt;br /&gt;
** &amp;lt;code&amp;gt;related_package_uuid&amp;lt;/code&amp;gt;: UUID of a package that is related to this one. E.g. UUID of a DIP when storing an AIP&lt;br /&gt;
&lt;br /&gt;
Creates a database entry tracking the package (AIP, transfer, etc).  If the package is an AIP, DIP or AIC and the current_location is an AIP or DIP storage location it also moves the files from the source to destination location.  If the package is a Transfer and the current_location is transfer backlog, it is also moved.&lt;br /&gt;
&lt;br /&gt;
This is handled through the modified &amp;lt;code&amp;gt;obj_create&amp;lt;/code&amp;gt; function, which calls &amp;lt;code&amp;gt;Package.store_aip&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Package.backlog_transfer&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get package details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Update package contents ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Flag to mark that this is reingest. Reduces chance to accidentally modify an AIP.&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the existing package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from.  This must be the same pipeline reingest was started on (tracked through &amp;lt;code&amp;gt;Package.misc_attributes.reingest_pipeline&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Updates the contents of a package during reingest.  If the package is an AIP or AIC, currently stored in an AIP storage location, and the 'reingest' parameter is set, it will call &amp;lt;code&amp;gt;Package.finish_reingest&amp;lt;/code&amp;gt; and merge the new AIP with the existing one.&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Update package metadata ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PATCH&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Pipeline UUID or None.&lt;br /&gt;
&lt;br /&gt;
Used to update metadata stored in the database for the package.  Currently, this is used to update the reingest status.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Currently, this always sets Package.misc_attributes.reingest to None, regardless of what value was actually passed in.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;update_in_place&amp;lt;/code&amp;gt; also helps.&lt;br /&gt;
&lt;br /&gt;
=== Delete package request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/delete_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for deleting the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline the delete request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the deletion. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the deletion.&lt;br /&gt;
&lt;br /&gt;
=== Recover AIP request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/recover_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for recovering the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the recovery request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the recovery. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the recovery.&lt;br /&gt;
&lt;br /&gt;
=== Download single file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/extract_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path_to_file&amp;lt;/code&amp;gt;: Path to the file to download, relative to the package path.&lt;br /&gt;
* '''Response''': Stream of the requested file&lt;br /&gt;
&lt;br /&gt;
Returns a single file from the Package.  If the package is compressed, it downloads the whole AIP and extracts it.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.  Currently, the storage service has no way to check if a file exists except by downloading and extracting this AIP&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Download package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;chunk number&amp;gt;/&amp;lt;/code&amp;gt; (for [[Storage Service#LOCKSS-o-matic | LOCKSS]] harvesting)&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the package&lt;br /&gt;
&lt;br /&gt;
Returns the entire package as a single file.  If the AIP is uncompressed, create one file by using `tar`.&lt;br /&gt;
&lt;br /&gt;
If the download URL has a chunk number, it will attempt to serve the LOCKSS chunk specified for that package. If the package is not in LOCKSS, it will return the the whole package.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Get pointer file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/pointer_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the pointer file.&lt;br /&gt;
&lt;br /&gt;
=== Check fixity ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/check_fixity/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;force_local&amp;lt;/code&amp;gt;: If true, download and run fixity on the AIP locally, instead of using the Space-provided fixity if available.&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True if the verification succeeded, False if the verification failed, None if the scan could not start&lt;br /&gt;
** &amp;lt;code&amp;gt;message&amp;lt;/code&amp;gt;: Human-readable string explaining the report; it will be empty for successful scans.&lt;br /&gt;
** &amp;lt;code&amp;gt;failures&amp;lt;/code&amp;gt;: List of 0 or more errors&lt;br /&gt;
** &amp;lt;code&amp;gt;timestamp&amp;lt;/code&amp;gt;: ISO-formated string with the datetime of the last fixity check. If the check was performed by an external system, this will be provided by that system. If not provided,or on error, it will be None.&lt;br /&gt;
&lt;br /&gt;
=== AIP storage callback request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/send_callback/post_store/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
Request to call any Callbacks configured to run post-storage for this AIP.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: This only works on locally available AIPs (AIPs stored in Spaces that are available via a UNIX filesystem layer).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True&lt;br /&gt;
** &amp;lt;code&amp;gt;package&amp;lt;/code&amp;gt;: UUID of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dictionaries with file information. Each dictionary has:&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_id&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
*** &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_package&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
*** &amp;lt;code&amp;gt;checksum&amp;lt;/code&amp;gt;: Checksum of the file, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Returns metadata about every file within the package.&lt;br /&gt;
&lt;br /&gt;
=== Update file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON list of dictionaries with information on the files to be added. Each dict must have the following attributes:&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Adds a set of files to a package.&lt;br /&gt;
&lt;br /&gt;
=== Delete file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': DELETE&lt;br /&gt;
&lt;br /&gt;
Removes all file records associated with this package.&lt;br /&gt;
&lt;br /&gt;
=== Query file information on packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/metadata/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
* '''Parameters''': Query string parameters.  Must have at least one, but not all are required&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
* '''Response''': JSON. List of dicts with file information about the files that match the query.&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;file_extension&amp;lt;/code&amp;gt;: File extension&lt;br /&gt;
** &amp;lt;code&amp;gt;filename&amp;lt;/code&amp;gt;: Name of the file, sans path.&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
=== Reingest AIP ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/reingest/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline to reingest on&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest_type&amp;lt;/code&amp;gt;: Type of reingest to start. One of &amp;lt;code&amp;gt;METADATA_ONLY&amp;lt;/code&amp;gt; (metadata-only reingest), &amp;lt;code&amp;gt;OBJECTS&amp;lt;/code&amp;gt; (partial reingest), &amp;lt;code&amp;gt;FULL&amp;lt;/code&amp;gt; (full reingest)&lt;br /&gt;
** &amp;lt;code&amp;gt;processing_config&amp;lt;/code&amp;gt;: Optional. Name of the processing configuration to use on full reingest&lt;br /&gt;
&lt;br /&gt;
=== SWORD endpoints ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/media/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/state/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Development documentation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12588</id>
		<title>Storage Service API</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12588"/>
		<updated>2018-07-31T21:08:28Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Create new location */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Development]] &amp;gt; Storage Service API&lt;br /&gt;
&lt;br /&gt;
The [[Storage Service]] API provides programmatic access to moving files around in storage areas that the Storage Service has access to.&lt;br /&gt;
&lt;br /&gt;
The API is written using [http://django-tastypie.readthedocs.io/en/latest/ TastyPie].&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: TastyPie is less well supported than [http://www.django-rest-framework.org/ Django REST Framework], both in terms of docs &amp;amp; community. We should look at replacing TastyPie with DRF.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Endpoints require authentication with a username and API key.  This can be submitted as GET parameters (eg &amp;lt;code&amp;gt;?username=test&amp;amp;api_key=e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;) or as a header (eg &amp;lt;code&amp;gt;Authorization: ApiKey test:e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
== A note about browsing ==&lt;br /&gt;
&lt;br /&gt;
A detailed schema can be found for each of the resources by adding &amp;quot;schema&amp;quot; to the get all URL.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/location/schema/?format=json&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;allowed_detail_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;,&lt;br /&gt;
        &amp;quot;post&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;allowed_list_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;default_format&amp;quot;: &amp;quot;application/json&amp;quot;,&lt;br /&gt;
    &amp;quot;default_limit&amp;quot;: 20,&lt;br /&gt;
    &amp;quot;fields&amp;quot;: {&lt;br /&gt;
        &amp;quot;description&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;description&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;enabled&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: true,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;True if space can be accessed.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;boolean&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Enabled&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Many related resources. Can be either a list of URIs or list of individually nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/pipeline/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_many&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;pipeline&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Purpose of the space.  Eg. AIP storage, Transfer source&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Purpose&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;quota&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: null,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Size, in bytes (optional)&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: true,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Quota&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Path to location, relative to the storage space's path.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Relative Path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;resource_uri&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;resource uri&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;space&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;A single related resource. Can be either a URI or set of nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/space/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_one&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;space&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;used&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: 0,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Amount used, in bytes.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Used&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unique identifier&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: true,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;uuid&amp;quot;&lt;br /&gt;
        }&lt;br /&gt;
    },&lt;br /&gt;
    &amp;quot;filtering&amp;quot;: {&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;quota&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;space&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;used&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: 1&lt;br /&gt;
    }&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
This schema, among other things, describes the fields in the resource (including the schema URI of related resource fields) and the fields that allow filtering. Valid filtering values are: Django ORM filters (e.g. startswith, exact, lte, etc.) or 1 or 2. If a filtering field is set to 2 it can be filtered over the related resource fields. For example, the locations could be filtered by their pipeline UUID setting it in a request parameter formatted with two underscore chars: &amp;lt;code&amp;gt;/api/v2/location/?pipeline__uuid=&amp;lt;uuid&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more info on how to interact with the API see:&lt;br /&gt;
&lt;br /&gt;
http://django-tastypie.readthedocs.io/en/v0.13.1/interacting.html&lt;br /&gt;
&lt;br /&gt;
== Pipeline ==&lt;br /&gt;
&lt;br /&gt;
=== Get all pipelines ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Description of the pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of pipelines. See [[#Get pipeline details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the pipelines in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by the description or uuid. Disabled pipelines are not returned.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/pipeline/?description__startswith=Archivematica' | python -m json.tool&lt;br /&gt;
 {&lt;br /&gt;
     &amp;quot;meta&amp;quot;: {&lt;br /&gt;
         &amp;quot;limit&amp;quot;: 20,&lt;br /&gt;
         &amp;quot;next&amp;quot;: null,&lt;br /&gt;
         &amp;quot;offset&amp;quot;: 0,&lt;br /&gt;
         &amp;quot;previous&amp;quot;: null,&lt;br /&gt;
         &amp;quot;total_count&amp;quot;: 1&lt;br /&gt;
     },&lt;br /&gt;
     &amp;quot;objects&amp;quot;: [&lt;br /&gt;
         {&lt;br /&gt;
             &amp;quot;description&amp;quot;: &amp;quot;Archivematica on alouette&amp;quot;,&lt;br /&gt;
             &amp;quot;remote_name&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;br /&gt;
             &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/dd354557-9e6e-4918-9fe3-a65b00ecb1af/&amp;quot;,&lt;br /&gt;
             &amp;quot;uuid&amp;quot;: &amp;quot;dd354557-9e6e-4918-9fe3-a65b00ecb1af&amp;quot;&lt;br /&gt;
         }&lt;br /&gt;
     ]&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Create new pipeline ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new pipeline: &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_key&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_username&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt;: If True, will associated default [[Storage Service#Locations | Locations]] with the newly created pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;shared_path&amp;lt;/code&amp;gt;: If default locations are created, create the [[Storage Service#Currently Processing | processing]] location at this path in the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
*** Before v0.11.0: If &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt; is set, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
*** In v0.11.0 or newer: If not provided, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
* '''Response''': JSON with data for the pipeline&lt;br /&gt;
&lt;br /&gt;
If the 'Pipelines disabled on creation' setting is set, the pipeline will be disabled by default, and will not respond to queries.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X POST -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; -H&amp;quot;Content-Type: application/json&amp;quot; -d'{&amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;, &amp;quot;create_default_locations&amp;quot;: true, &amp;quot;api_username&amp;quot;: &amp;quot;demo&amp;quot;, &amp;quot;api_key&amp;quot;: &amp;quot;03ecb307f5b8012f4771d245d534830378a87259&amp;quot;}' 'http://192.168.1.42:8000/api/v2/pipeline/'&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;create_default_locations&amp;quot;: true,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;,&lt;br /&gt;
    &amp;quot;remote_name&amp;quot;: &amp;quot;192.168.1.42&amp;quot;,&lt;br /&gt;
    &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/99354557-9e6e-4918-9fe3-a65b00ecb199/&amp;quot;,&lt;br /&gt;
    &amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Get pipeline details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Pipeline description&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: IP or hostname of the pipeline. For use in API calls&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI for this pipeline in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
&lt;br /&gt;
== Space ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Is there no way to create Spaces in the API?&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get all spaces ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Protocol that the [[Storage Service#Space | Space]] uses. Must be searched based on the database code.&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size in bytes. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the Space&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of spaces. See [[#Get space details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the spaces in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by several fields: access protocol, path, size, amount used, UUID and verified status. Disabled spaces are not returned.&lt;br /&gt;
&lt;br /&gt;
=== Get space details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Database code for the access protocol&lt;br /&gt;
** &amp;lt;code&amp;gt;last_verified&amp;lt;/code&amp;gt;: Date of last verification. This is a stub feature&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI to the resource in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size of the space in bytes.&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. &lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the space&lt;br /&gt;
** &amp;lt;code&amp;gt;verified&amp;lt;/code&amp;gt;: If the space is verified. This is a stub feature&lt;br /&gt;
** Other space-specific fields&lt;br /&gt;
&lt;br /&gt;
=== Browse space path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Space to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries at path, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in path. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Location ==&lt;br /&gt;
&lt;br /&gt;
=== Get all locations ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new location ===&lt;br /&gt;
&lt;br /&gt;
Added in v0.12 - see [https://github.com/artefactual/archivematica-storage-service/issues/367 issue 367] and [https://github.com/archivematica/Issues/issues/37 issue 37].&lt;br /&gt;
&lt;br /&gt;
This endpoint creates a location in the storage service, but it doesn't actually create the directory that the location points to.  &lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;.&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;space&amp;lt;/code&amp;gt;: URI of the space.&lt;br /&gt;
** &amp;lt;code&amp;gt;default&amp;lt;/code&amp;gt;: If 'true' this location will be the default for it's purpose. &lt;br /&gt;
** &amp;lt;code&amp;gt;purpose&amp;lt;/code&amp;gt;: (below is a list of possible values)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AR&amp;lt;/code&amp;gt; (AIP_RECOVERY)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AS&amp;lt;/code&amp;gt; (AIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;CP&amp;lt;/code&amp;gt; (CURRENTLY_PROCESSING)&lt;br /&gt;
*** &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (DIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SD&amp;lt;/code&amp;gt; (SWORD_DEPOSIT)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (STORAGE_SERVICE_INTERNAL)&lt;br /&gt;
*** &amp;lt;code&amp;gt;BL&amp;lt;/code&amp;gt; (BACKLOG)&lt;br /&gt;
*** &amp;lt;code&amp;gt;TS&amp;lt;/code&amp;gt; (TRANSFER_SOURCE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;RP&amp;lt;/code&amp;gt; (REPLICATOR)&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative to the space's path.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl -s -d '{&lt;br /&gt;
    &amp;quot;pipeline&amp;quot;: [&amp;quot;/api/v2/pipeline/90707555-244f-47af-8271-66496a6a965b/&amp;quot;],&lt;br /&gt;
    &amp;quot;purpose&amp;quot;: &amp;quot;TS&amp;quot;,&lt;br /&gt;
    &amp;quot;relative_path&amp;quot;: &amp;quot;foo/bar&amp;quot;,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;foobar&amp;quot;,&lt;br /&gt;
    &amp;quot;space&amp;quot;: &amp;quot;/api/v2/space/141593ff-2a27-44a1-9de1-917573fa0f4a/&amp;quot;&lt;br /&gt;
}' \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
        &amp;quot;http://127.0.0.1:62081/api/v2/location/&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get location details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Move files to this location ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location the files should be moved from&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the [[Storage Service#Pipeline | pipeline]]. Both Locations must be associated with this pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dicts containing &amp;lt;code&amp;gt;source&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;destination&amp;lt;/code&amp;gt;.  The source and destination are paths relative to their Location of the files to be moved.&lt;br /&gt;
&lt;br /&gt;
Intended for use with creating Transfers, SIPs, etc and other cases where files need to be moved but not tracked by the storage service.&lt;br /&gt;
&lt;br /&gt;
=== Browse location path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Location to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries in `path`, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in `path`. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== SWORD collection ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/sword/collection/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Package ==&lt;br /&gt;
&lt;br /&gt;
=== Get all packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON. Fields for a new package:&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the new package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;DIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;transfer&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposit&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from&lt;br /&gt;
** &amp;lt;code&amp;gt;related_package_uuid&amp;lt;/code&amp;gt;: UUID of a package that is related to this one. E.g. UUID of a DIP when storing an AIP&lt;br /&gt;
&lt;br /&gt;
Creates a database entry tracking the package (AIP, transfer, etc).  If the package is an AIP, DIP or AIC and the current_location is an AIP or DIP storage location it also moves the files from the source to destination location.  If the package is a Transfer and the current_location is transfer backlog, it is also moved.&lt;br /&gt;
&lt;br /&gt;
This is handled through the modified &amp;lt;code&amp;gt;obj_create&amp;lt;/code&amp;gt; function, which calls &amp;lt;code&amp;gt;Package.store_aip&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Package.backlog_transfer&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get package details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Update package contents ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Flag to mark that this is reingest. Reduces chance to accidentally modify an AIP.&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the existing package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from.  This must be the same pipeline reingest was started on (tracked through &amp;lt;code&amp;gt;Package.misc_attributes.reingest_pipeline&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Updates the contents of a package during reingest.  If the package is an AIP or AIC, currently stored in an AIP storage location, and the 'reingest' parameter is set, it will call &amp;lt;code&amp;gt;Package.finish_reingest&amp;lt;/code&amp;gt; and merge the new AIP with the existing one.&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Update package metadata ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PATCH&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Pipeline UUID or None.&lt;br /&gt;
&lt;br /&gt;
Used to update metadata stored in the database for the package.  Currently, this is used to update the reingest status.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Currently, this always sets Package.misc_attributes.reingest to None, regardless of what value was actually passed in.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;update_in_place&amp;lt;/code&amp;gt; also helps.&lt;br /&gt;
&lt;br /&gt;
=== Delete package request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/delete_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for deleting the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline the delete request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the deletion. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the deletion.&lt;br /&gt;
&lt;br /&gt;
=== Recover AIP request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/recover_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for recovering the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the recovery request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the recovery. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the recovery.&lt;br /&gt;
&lt;br /&gt;
=== Download single file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/extract_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path_to_file&amp;lt;/code&amp;gt;: Path to the file to download, relative to the package path.&lt;br /&gt;
* '''Response''': Stream of the requested file&lt;br /&gt;
&lt;br /&gt;
Returns a single file from the Package.  If the package is compressed, it downloads the whole AIP and extracts it.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.  Currently, the storage service has no way to check if a file exists except by downloading and extracting this AIP&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Download package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;chunk number&amp;gt;/&amp;lt;/code&amp;gt; (for [[Storage Service#LOCKSS-o-matic | LOCKSS]] harvesting)&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the package&lt;br /&gt;
&lt;br /&gt;
Returns the entire package as a single file.  If the AIP is uncompressed, create one file by using `tar`.&lt;br /&gt;
&lt;br /&gt;
If the download URL has a chunk number, it will attempt to serve the LOCKSS chunk specified for that package. If the package is not in LOCKSS, it will return the the whole package.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Get pointer file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/pointer_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the pointer file.&lt;br /&gt;
&lt;br /&gt;
=== Check fixity ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/check_fixity/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;force_local&amp;lt;/code&amp;gt;: If true, download and run fixity on the AIP locally, instead of using the Space-provided fixity if available.&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True if the verification succeeded, False if the verification failed, None if the scan could not start&lt;br /&gt;
** &amp;lt;code&amp;gt;message&amp;lt;/code&amp;gt;: Human-readable string explaining the report; it will be empty for successful scans.&lt;br /&gt;
** &amp;lt;code&amp;gt;failures&amp;lt;/code&amp;gt;: List of 0 or more errors&lt;br /&gt;
** &amp;lt;code&amp;gt;timestamp&amp;lt;/code&amp;gt;: ISO-formated string with the datetime of the last fixity check. If the check was performed by an external system, this will be provided by that system. If not provided,or on error, it will be None.&lt;br /&gt;
&lt;br /&gt;
=== AIP storage callback request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/send_callback/post_store/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
Request to call any Callbacks configured to run post-storage for this AIP.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: This only works on locally available AIPs (AIPs stored in Spaces that are available via a UNIX filesystem layer).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True&lt;br /&gt;
** &amp;lt;code&amp;gt;package&amp;lt;/code&amp;gt;: UUID of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dictionaries with file information. Each dictionary has:&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_id&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
*** &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_package&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
*** &amp;lt;code&amp;gt;checksum&amp;lt;/code&amp;gt;: Checksum of the file, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Returns metadata about every file within the package.&lt;br /&gt;
&lt;br /&gt;
=== Update file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON list of dictionaries with information on the files to be added. Each dict must have the following attributes:&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Adds a set of files to a package.&lt;br /&gt;
&lt;br /&gt;
=== Delete file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': DELETE&lt;br /&gt;
&lt;br /&gt;
Removes all file records associated with this package.&lt;br /&gt;
&lt;br /&gt;
=== Query file information on packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/metadata/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
* '''Parameters''': Query string parameters.  Must have at least one, but not all are required&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
* '''Response''': JSON. List of dicts with file information about the files that match the query.&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;file_extension&amp;lt;/code&amp;gt;: File extension&lt;br /&gt;
** &amp;lt;code&amp;gt;filename&amp;lt;/code&amp;gt;: Name of the file, sans path.&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
=== Reingest AIP ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/reingest/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline to reingest on&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest_type&amp;lt;/code&amp;gt;: Type of reingest to start. One of &amp;lt;code&amp;gt;METADATA_ONLY&amp;lt;/code&amp;gt; (metadata-only reingest), &amp;lt;code&amp;gt;OBJECTS&amp;lt;/code&amp;gt; (partial reingest), &amp;lt;code&amp;gt;FULL&amp;lt;/code&amp;gt; (full reingest)&lt;br /&gt;
** &amp;lt;code&amp;gt;processing_config&amp;lt;/code&amp;gt;: Optional. Name of the processing configuration to use on full reingest&lt;br /&gt;
&lt;br /&gt;
=== SWORD endpoints ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/media/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/state/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Development documentation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12587</id>
		<title>Storage Service API</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Storage_Service_API&amp;diff=12587"/>
		<updated>2018-07-31T20:44:05Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: /* Create new location */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Development]] &amp;gt; Storage Service API&lt;br /&gt;
&lt;br /&gt;
The [[Storage Service]] API provides programmatic access to moving files around in storage areas that the Storage Service has access to.&lt;br /&gt;
&lt;br /&gt;
The API is written using [http://django-tastypie.readthedocs.io/en/latest/ TastyPie].&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: TastyPie is less well supported than [http://www.django-rest-framework.org/ Django REST Framework], both in terms of docs &amp;amp; community. We should look at replacing TastyPie with DRF.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Endpoints require authentication with a username and API key.  This can be submitted as GET parameters (eg &amp;lt;code&amp;gt;?username=test&amp;amp;api_key=e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;) or as a header (eg &amp;lt;code&amp;gt;Authorization: ApiKey test:e6282adabed84e39ffe451f8bf6ff1a67c1fc9f2&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
== A note about browsing ==&lt;br /&gt;
&lt;br /&gt;
A detailed schema can be found for each of the resources by adding &amp;quot;schema&amp;quot; to the get all URL.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/location/schema/?format=json&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;allowed_detail_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;,&lt;br /&gt;
        &amp;quot;post&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;allowed_list_http_methods&amp;quot;: [&lt;br /&gt;
        &amp;quot;get&amp;quot;&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;default_format&amp;quot;: &amp;quot;application/json&amp;quot;,&lt;br /&gt;
    &amp;quot;default_limit&amp;quot;: 20,&lt;br /&gt;
    &amp;quot;fields&amp;quot;: {&lt;br /&gt;
        &amp;quot;description&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;description&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;enabled&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: true,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;True if space can be accessed.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;boolean&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Enabled&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Many related resources. Can be either a list of URIs or list of individually nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/pipeline/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_many&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;pipeline&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Purpose of the space.  Eg. AIP storage, Transfer source&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Purpose&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;quota&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: null,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Size, in bytes (optional)&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: true,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Quota&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Path to location, relative to the storage space's path.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Relative Path&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;resource_uri&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unicode string data. Ex: \&amp;quot;Hello World\&amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: true,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;resource uri&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;space&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;No default provided.&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;A single related resource. Can be either a URI or set of nested resource data.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;related_schema&amp;quot;: &amp;quot;/api/v2/space/schema/&amp;quot;,&lt;br /&gt;
            &amp;quot;related_type&amp;quot;: &amp;quot;to_one&amp;quot;,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;related&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;space&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;used&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: false,&lt;br /&gt;
            &amp;quot;default&amp;quot;: 0,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Amount used, in bytes.&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: false,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;Used&amp;quot;&lt;br /&gt;
        },&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: {&lt;br /&gt;
            &amp;quot;blank&amp;quot;: true,&lt;br /&gt;
            &amp;quot;default&amp;quot;: &amp;quot;&amp;quot;,&lt;br /&gt;
            &amp;quot;help_text&amp;quot;: &amp;quot;Unique identifier&amp;quot;,&lt;br /&gt;
            &amp;quot;nullable&amp;quot;: false,&lt;br /&gt;
            &amp;quot;primary_key&amp;quot;: false,&lt;br /&gt;
            &amp;quot;readonly&amp;quot;: false,&lt;br /&gt;
            &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,&lt;br /&gt;
            &amp;quot;unique&amp;quot;: true,&lt;br /&gt;
            &amp;quot;verbose_name&amp;quot;: &amp;quot;uuid&amp;quot;&lt;br /&gt;
        }&lt;br /&gt;
    },&lt;br /&gt;
    &amp;quot;filtering&amp;quot;: {&lt;br /&gt;
        &amp;quot;pipeline&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;purpose&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;quota&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;relative_path&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;space&amp;quot;: 2,&lt;br /&gt;
        &amp;quot;used&amp;quot;: 1,&lt;br /&gt;
        &amp;quot;uuid&amp;quot;: 1&lt;br /&gt;
    }&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
This schema, among other things, describes the fields in the resource (including the schema URI of related resource fields) and the fields that allow filtering. Valid filtering values are: Django ORM filters (e.g. startswith, exact, lte, etc.) or 1 or 2. If a filtering field is set to 2 it can be filtered over the related resource fields. For example, the locations could be filtered by their pipeline UUID setting it in a request parameter formatted with two underscore chars: &amp;lt;code&amp;gt;/api/v2/location/?pipeline__uuid=&amp;lt;uuid&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more info on how to interact with the API see:&lt;br /&gt;
&lt;br /&gt;
http://django-tastypie.readthedocs.io/en/v0.13.1/interacting.html&lt;br /&gt;
&lt;br /&gt;
== Pipeline ==&lt;br /&gt;
&lt;br /&gt;
=== Get all pipelines ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Description of the pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of pipelines. See [[#Get pipeline details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the pipelines in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by the description or uuid. Disabled pipelines are not returned.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X GET -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; 'http://localhost:8000/api/v2/pipeline/?description__startswith=Archivematica' | python -m json.tool&lt;br /&gt;
 {&lt;br /&gt;
     &amp;quot;meta&amp;quot;: {&lt;br /&gt;
         &amp;quot;limit&amp;quot;: 20,&lt;br /&gt;
         &amp;quot;next&amp;quot;: null,&lt;br /&gt;
         &amp;quot;offset&amp;quot;: 0,&lt;br /&gt;
         &amp;quot;previous&amp;quot;: null,&lt;br /&gt;
         &amp;quot;total_count&amp;quot;: 1&lt;br /&gt;
     },&lt;br /&gt;
     &amp;quot;objects&amp;quot;: [&lt;br /&gt;
         {&lt;br /&gt;
             &amp;quot;description&amp;quot;: &amp;quot;Archivematica on alouette&amp;quot;,&lt;br /&gt;
             &amp;quot;remote_name&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;br /&gt;
             &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/dd354557-9e6e-4918-9fe3-a65b00ecb1af/&amp;quot;,&lt;br /&gt;
             &amp;quot;uuid&amp;quot;: &amp;quot;dd354557-9e6e-4918-9fe3-a65b00ecb1af&amp;quot;&lt;br /&gt;
         }&lt;br /&gt;
     ]&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Create new pipeline ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** Should contain fields for a new pipeline: &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_key&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;api_username&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt;: If True, will associated default [[Storage Service#Locations | Locations]] with the newly created pipeline&lt;br /&gt;
** &amp;lt;code&amp;gt;shared_path&amp;lt;/code&amp;gt;: If default locations are created, create the [[Storage Service#Currently Processing | processing]] location at this path in the local filesystem&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
*** Before v0.11.0: If &amp;lt;code&amp;gt;create_default_locations&amp;lt;/code&amp;gt; is set, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
*** In v0.11.0 or newer: If not provided, SS will try to guess the value using the &amp;lt;code&amp;gt;REMOTE_ADDR&amp;lt;/code&amp;gt; header.&lt;br /&gt;
* '''Response''': JSON with data for the pipeline&lt;br /&gt;
&lt;br /&gt;
If the 'Pipelines disabled on creation' setting is set, the pipeline will be disabled by default, and will not respond to queries.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
 $ curl -X POST -H&amp;quot;Authorization: ApiKey test:95141fc645ed97a95893f1f865d24687f89a27ad&amp;quot; -H&amp;quot;Content-Type: application/json&amp;quot; -d'{&amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;, &amp;quot;create_default_locations&amp;quot;: true, &amp;quot;api_username&amp;quot;: &amp;quot;demo&amp;quot;, &amp;quot;api_key&amp;quot;: &amp;quot;03ecb307f5b8012f4771d245d534830378a87259&amp;quot;}' 'http://192.168.1.42:8000/api/v2/pipeline/'&lt;br /&gt;
 {&lt;br /&gt;
    &amp;quot;create_default_locations&amp;quot;: true,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;Test pipeline&amp;quot;,&lt;br /&gt;
    &amp;quot;remote_name&amp;quot;: &amp;quot;192.168.1.42&amp;quot;,&lt;br /&gt;
    &amp;quot;resource_uri&amp;quot;: &amp;quot;/api/v2/pipeline/99354557-9e6e-4918-9fe3-a65b00ecb199/&amp;quot;,&lt;br /&gt;
    &amp;quot;uuid&amp;quot;: &amp;quot;99354557-9e6e-4918-9fe3-a65b00ecb199&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
=== Get pipeline details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/pipeline/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;: Pipeline description&lt;br /&gt;
** &amp;lt;code&amp;gt;remote_name&amp;lt;/code&amp;gt;: IP or hostname of the pipeline. For use in API calls&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI for this pipeline in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the pipeline&lt;br /&gt;
&lt;br /&gt;
== Space ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Is there no way to create Spaces in the API?&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get all spaces ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Protocol that the [[Storage Service#Space | Space]] uses. Must be searched based on the database code.&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size in bytes. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. Can use greater than (size__gt=1024), less than (size__lt=1024), and other Django [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#field-lookups field lookups].&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the Space&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;meta&amp;lt;/code&amp;gt;: Metadata on the response: number of hits, pagination information&lt;br /&gt;
** &amp;lt;code&amp;gt;objects&amp;lt;/code&amp;gt;: List of spaces. See [[#Get space details]] for format&lt;br /&gt;
&lt;br /&gt;
Returns information about all the spaces in the system.  Can be [http://django-tastypie.readthedocs.io/en/latest/resources.html#basic-filtering filtered] by several fields: access protocol, path, size, amount used, UUID and verified status. Disabled spaces are not returned.&lt;br /&gt;
&lt;br /&gt;
=== Get space details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;access_protocol&amp;lt;/code&amp;gt;: Database code for the access protocol&lt;br /&gt;
** &amp;lt;code&amp;gt;last_verified&amp;lt;/code&amp;gt;: Date of last verification. This is a stub feature&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Space's path&lt;br /&gt;
** &amp;lt;code&amp;gt;resource_uri&amp;lt;/code&amp;gt;: URI to the resource in the API&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Maximum size of the space in bytes.&lt;br /&gt;
** &amp;lt;code&amp;gt;used&amp;lt;/code&amp;gt;: Bytes stored in this space. &lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the space&lt;br /&gt;
** &amp;lt;code&amp;gt;verified&amp;lt;/code&amp;gt;: If the space is verified. This is a stub feature&lt;br /&gt;
** Other space-specific fields&lt;br /&gt;
&lt;br /&gt;
=== Browse space path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/space/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Space to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries at path, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in path. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Location ==&lt;br /&gt;
&lt;br /&gt;
=== Get all locations ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new location ===&lt;br /&gt;
&lt;br /&gt;
Added in v0.12 - see [https://github.com/artefactual/archivematica-storage-service/issues/367 issue 367] and [https://github.com/archivematica/Issues/issues/37 issue 37].&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;description&amp;lt;/code&amp;gt;.&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;space&amp;lt;/code&amp;gt;: URI of the space.&lt;br /&gt;
** &amp;lt;code&amp;gt;default&amp;lt;/code&amp;gt;: If 'true' this location will be the default for it's purpose. &lt;br /&gt;
** &amp;lt;code&amp;gt;purpose&amp;lt;/code&amp;gt;: (below is a list of possible values)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AR&amp;lt;/code&amp;gt; (AIP_RECOVERY)&lt;br /&gt;
*** &amp;lt;code&amp;gt;AS&amp;lt;/code&amp;gt; (AIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;CP&amp;lt;/code&amp;gt; (CURRENTLY_PROCESSING)&lt;br /&gt;
*** &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (DIP_STORAGE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SD&amp;lt;/code&amp;gt; (SWORD_DEPOSIT)&lt;br /&gt;
*** &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (STORAGE_SERVICE_INTERNAL)&lt;br /&gt;
*** &amp;lt;code&amp;gt;BL&amp;lt;/code&amp;gt; (BACKLOG)&lt;br /&gt;
*** &amp;lt;code&amp;gt;TS&amp;lt;/code&amp;gt; (TRANSFER_SOURCE)&lt;br /&gt;
*** &amp;lt;code&amp;gt;RP&amp;lt;/code&amp;gt; (REPLICATOR)&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative to the space's path.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl -s -d '{&lt;br /&gt;
    &amp;quot;pipeline&amp;quot;: [&amp;quot;/api/v2/pipeline/90707555-244f-47af-8271-66496a6a965b/&amp;quot;],&lt;br /&gt;
    &amp;quot;purpose&amp;quot;: &amp;quot;TS&amp;quot;,&lt;br /&gt;
    &amp;quot;relative_path&amp;quot;: &amp;quot;foo/bar&amp;quot;,&lt;br /&gt;
    &amp;quot;description&amp;quot;: &amp;quot;foobar&amp;quot;,&lt;br /&gt;
    &amp;quot;space&amp;quot;: &amp;quot;/api/v2/space/141593ff-2a27-44a1-9de1-917573fa0f4a/&amp;quot;&lt;br /&gt;
}' \&lt;br /&gt;
    -X POST \&lt;br /&gt;
    -H &amp;quot;Authorization: ApiKey test:test&amp;quot; \&lt;br /&gt;
    -H &amp;quot;Content-Type: application/json&amp;quot; \&lt;br /&gt;
        &amp;quot;http://127.0.0.1:62081/api/v2/location/&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get location details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Move files to this location ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location the files should be moved from&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the [[Storage Service#Pipeline | pipeline]]. Both Locations must be associated with this pipeline.&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dicts containing &amp;lt;code&amp;gt;source&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;destination&amp;lt;/code&amp;gt;.  The source and destination are paths relative to their Location of the files to be moved.&lt;br /&gt;
&lt;br /&gt;
Intended for use with creating Transfers, SIPs, etc and other cases where files need to be moved but not tracked by the storage service.&lt;br /&gt;
&lt;br /&gt;
=== Browse location path ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/browse/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;path&amp;lt;/code&amp;gt;: Path inside the Location to look&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;entries&amp;lt;/code&amp;gt;: List of entries in `path`, files or directories&lt;br /&gt;
** &amp;lt;code&amp;gt;directories&amp;lt;/code&amp;gt;: List of directories in `path`. Subset of `entries`.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Version 1: Returns paths as strings&lt;br /&gt;
Version 2: Returns all paths base64 encoded&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== SWORD collection ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/location/&amp;lt;UUID&amp;gt;/sword/collection/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Package ==&lt;br /&gt;
&lt;br /&gt;
=== Get all packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Create new package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON. Fields for a new package:&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the new package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;DIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;transfer&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposit&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from&lt;br /&gt;
** &amp;lt;code&amp;gt;related_package_uuid&amp;lt;/code&amp;gt;: UUID of a package that is related to this one. E.g. UUID of a DIP when storing an AIP&lt;br /&gt;
&lt;br /&gt;
Creates a database entry tracking the package (AIP, transfer, etc).  If the package is an AIP, DIP or AIC and the current_location is an AIP or DIP storage location it also moves the files from the source to destination location.  If the package is a Transfer and the current_location is transfer backlog, it is also moved.&lt;br /&gt;
&lt;br /&gt;
This is handled through the modified &amp;lt;code&amp;gt;obj_create&amp;lt;/code&amp;gt; function, which calls &amp;lt;code&amp;gt;Package.store_aip&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Package.backlog_transfer&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Get package details ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
=== Update package contents ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Flag to mark that this is reingest. Reduces chance to accidentally modify an AIP.&lt;br /&gt;
** &amp;lt;code&amp;gt;uuid&amp;lt;/code&amp;gt;: UUID of the existing package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_location&amp;lt;/code&amp;gt;: URI of the Location where the package is currently&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_path&amp;lt;/code&amp;gt;: Path to the package, relative to the origin_location&lt;br /&gt;
** &amp;lt;code&amp;gt;current_location&amp;lt;/code&amp;gt;: URI of the Location where the package should be stored&lt;br /&gt;
** &amp;lt;code&amp;gt;current_path&amp;lt;/code&amp;gt;: Path where the package should be stored, relative to the current_location&lt;br /&gt;
** &amp;lt;code&amp;gt;package_type&amp;lt;/code&amp;gt;: Type of package this is. One of: &amp;lt;code&amp;gt;AIP&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;AIC&amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;size&amp;lt;/code&amp;gt;: Size of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;origin_pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the package is from.  This must be the same pipeline reingest was started on (tracked through &amp;lt;code&amp;gt;Package.misc_attributes.reingest_pipeline&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Updates the contents of a package during reingest.  If the package is an AIP or AIC, currently stored in an AIP storage location, and the 'reingest' parameter is set, it will call &amp;lt;code&amp;gt;Package.finish_reingest&amp;lt;/code&amp;gt; and merge the new AIP with the existing one.&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Update package metadata ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PATCH&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest&amp;lt;/code&amp;gt;: Pipeline UUID or None.&lt;br /&gt;
&lt;br /&gt;
Used to update metadata stored in the database for the package.  Currently, this is used to update the reingest status.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: Currently, this always sets Package.misc_attributes.reingest to None, regardless of what value was actually passed in.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
This is implemented using a modified &amp;lt;code&amp;gt;obj_update&amp;lt;/code&amp;gt; which calls &amp;lt;code&amp;gt;obj_update_hook&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;update_in_place&amp;lt;/code&amp;gt; also helps.&lt;br /&gt;
&lt;br /&gt;
=== Delete package request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/delete_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for deleting the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline the delete request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the deletion. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the deletion.&lt;br /&gt;
&lt;br /&gt;
=== Recover AIP request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/recover_aip/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;event_reason&amp;lt;/code&amp;gt;: Reason for recovering the AIP&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: URI of the pipeline the recovery request is from&lt;br /&gt;
** &amp;lt;code&amp;gt;user_id&amp;lt;/code&amp;gt;: User ID requesting the recovery. This is the ID of the user on the pipeline, and must be an integer greater than 0.&lt;br /&gt;
** &amp;lt;code&amp;gt;user_email&amp;lt;/code&amp;gt;:  Email of the user requesting the recovery.&lt;br /&gt;
&lt;br /&gt;
=== Download single file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/extract_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path_to_file&amp;lt;/code&amp;gt;: Path to the file to download, relative to the package path.&lt;br /&gt;
* '''Response''': Stream of the requested file&lt;br /&gt;
&lt;br /&gt;
Returns a single file from the Package.  If the package is compressed, it downloads the whole AIP and extracts it.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.  Currently, the storage service has no way to check if a file exists except by downloading and extracting this AIP&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Download package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/download/&amp;lt;chunk number&amp;gt;/&amp;lt;/code&amp;gt; (for [[Storage Service#LOCKSS-o-matic | LOCKSS]] harvesting)&lt;br /&gt;
* '''Verb''': GET, HEAD&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the package&lt;br /&gt;
&lt;br /&gt;
Returns the entire package as a single file.  If the AIP is uncompressed, create one file by using `tar`.&lt;br /&gt;
&lt;br /&gt;
If the download URL has a chunk number, it will attempt to serve the LOCKSS chunk specified for that package. If the package is not in LOCKSS, it will return the the whole package.&lt;br /&gt;
&lt;br /&gt;
This responds to HEAD because AtoM uses HEAD to check for the existence of a file. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: HEAD and GET should not perform the same functions. HEAD should be updated to not return the file, and to only check for existence.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If the package is in [[Storage Service#Arkivum | Arkivum]], the package may not actually be available.  This endpoint checks if the package is locally available. If it is, it is returned as normal. If not, it returns &amp;lt;code&amp;gt;202&amp;lt;/code&amp;gt; and emails the administrator about the attempted access.&lt;br /&gt;
&lt;br /&gt;
=== Get pointer file ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/pointer_file/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': None&lt;br /&gt;
* '''Response''': Stream of the pointer file.&lt;br /&gt;
&lt;br /&gt;
=== Check fixity ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/check_fixity/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Parameters''': Query string parameters&lt;br /&gt;
** &amp;lt;code&amp;gt;force_local&amp;lt;/code&amp;gt;: If true, download and run fixity on the AIP locally, instead of using the Space-provided fixity if available.&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True if the verification succeeded, False if the verification failed, None if the scan could not start&lt;br /&gt;
** &amp;lt;code&amp;gt;message&amp;lt;/code&amp;gt;: Human-readable string explaining the report; it will be empty for successful scans.&lt;br /&gt;
** &amp;lt;code&amp;gt;failures&amp;lt;/code&amp;gt;: List of 0 or more errors&lt;br /&gt;
** &amp;lt;code&amp;gt;timestamp&amp;lt;/code&amp;gt;: ISO-formated string with the datetime of the last fixity check. If the check was performed by an external system, this will be provided by that system. If not provided,or on error, it will be None.&lt;br /&gt;
&lt;br /&gt;
=== AIP storage callback request ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/send_callback/post_store/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
&lt;br /&gt;
Request to call any Callbacks configured to run post-storage for this AIP.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;background-color:#ffeecc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;;&lt;br /&gt;
| Improvement Note: This only works on locally available AIPs (AIPs stored in Spaces that are available via a UNIX filesystem layer).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Get file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET&lt;br /&gt;
* '''Response''': JSON&lt;br /&gt;
** &amp;lt;code&amp;gt;success&amp;lt;/code&amp;gt;: True&lt;br /&gt;
** &amp;lt;code&amp;gt;package&amp;lt;/code&amp;gt;: UUID of the package&lt;br /&gt;
** &amp;lt;code&amp;gt;files&amp;lt;/code&amp;gt;: List of dictionaries with file information. Each dictionary has:&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_id&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
*** &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
*** &amp;lt;code&amp;gt;source_package&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
*** &amp;lt;code&amp;gt;checksum&amp;lt;/code&amp;gt;: Checksum of the file, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
*** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Returns metadata about every file within the package.&lt;br /&gt;
&lt;br /&gt;
=== Update file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': PUT&lt;br /&gt;
* '''Parameters''': JSON list of dictionaries with information on the files to be added. Each dict must have the following attributes:&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
Adds a set of files to a package.&lt;br /&gt;
&lt;br /&gt;
=== Delete file information for package ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/contents/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': DELETE&lt;br /&gt;
&lt;br /&gt;
Removes all file records associated with this package.&lt;br /&gt;
&lt;br /&gt;
=== Query file information on packages ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/metadata/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': GET, POST&lt;br /&gt;
* '''Parameters''': Query string parameters.  Must have at least one, but not all are required&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
* '''Response''': JSON. List of dicts with file information about the files that match the query.&lt;br /&gt;
** &amp;lt;code&amp;gt;accessionid&amp;lt;/code&amp;gt;: Accession number, or an empty string&lt;br /&gt;
** &amp;lt;code&amp;gt;file_extension&amp;lt;/code&amp;gt;: File extension&lt;br /&gt;
** &amp;lt;code&amp;gt;filename&amp;lt;/code&amp;gt;: Name of the file, sans path.&lt;br /&gt;
** &amp;lt;code&amp;gt;relative_path&amp;lt;/code&amp;gt;: Relative path of the file inside the package&lt;br /&gt;
** &amp;lt;code&amp;gt;fileuuid&amp;lt;/code&amp;gt;: UUID of the file to index&lt;br /&gt;
** &amp;lt;code&amp;gt;sipuuid&amp;lt;/code&amp;gt;: UUID of the SIP this file is from&lt;br /&gt;
** &amp;lt;code&amp;gt;origin&amp;lt;/code&amp;gt;: UUID of the Archivematica dashboard this is from&lt;br /&gt;
&lt;br /&gt;
=== Reingest AIP ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/reingest/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''Verb''': POST&lt;br /&gt;
* '''Parameters''': JSON body&lt;br /&gt;
** &amp;lt;code&amp;gt;pipeline&amp;lt;/code&amp;gt;: UUID of the pipeline to reingest on&lt;br /&gt;
** &amp;lt;code&amp;gt;reingest_type&amp;lt;/code&amp;gt;: Type of reingest to start. One of &amp;lt;code&amp;gt;METADATA_ONLY&amp;lt;/code&amp;gt; (metadata-only reingest), &amp;lt;code&amp;gt;OBJECTS&amp;lt;/code&amp;gt; (partial reingest), &amp;lt;code&amp;gt;FULL&amp;lt;/code&amp;gt; (full reingest)&lt;br /&gt;
** &amp;lt;code&amp;gt;processing_config&amp;lt;/code&amp;gt;: Optional. Name of the processing configuration to use on full reingest&lt;br /&gt;
&lt;br /&gt;
=== SWORD endpoints ===&lt;br /&gt;
&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/media/&amp;lt;/code&amp;gt;&lt;br /&gt;
* '''URL''': &amp;lt;code&amp;gt;/api/v2/file/&amp;lt;UUID&amp;gt;/sword/state/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [[Sword API]] for details.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Development documentation]]&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12485</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12485"/>
		<updated>2018-05-15T19:43:21Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: Added future considerations section&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (exact release tbc).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
==Overview==&lt;br /&gt;
This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''May 11, 2018'''&lt;br /&gt;
To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. &lt;br /&gt;
&lt;br /&gt;
The early drafts are documented in this google doc: [http://docs.behat.org/en/v2.5/guides/1.gherkin.html]&lt;br /&gt;
Once the draft has been reviewed we will publish it to our acceptance test repository in github. &lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
April 24, 2017&lt;br /&gt;
This feature requires a development branch of Archivematica, which can be installed with the following steps:&lt;br /&gt;
&lt;br /&gt;
1) install deploy-pub. https://github.com/artefactual/deploy-pub&lt;br /&gt;
2) use the archivematica-centos7 playbook in deploy-pub https://github.com/artefactual/deploy-pub/tree/master/playbooks/archivematica-centos7&lt;br /&gt;
3) create a hosts file that lists your target machine (see digital ocean example linked from playbook)&lt;br /&gt;
4) in requirements.yml change version of ansible-archivematica-src to &amp;quot;stable/1.6.x&amp;quot;&lt;br /&gt;
5) change singlenode.yml to point to the host you defined in your hosts file.&lt;br /&gt;
6) change the vars-singlenode.yml to include the following info:&lt;br /&gt;
#required for dataverse testing&lt;br /&gt;
archivematica_src_am_version: &amp;quot;dev/dataverse-poc&amp;quot;&lt;br /&gt;
archivematica_src_automationtools: &amp;quot;yes&amp;quot;&lt;br /&gt;
archivematica_src_automationtools_version: &amp;quot;dev/dataverse&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Workflow==&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
*The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, preparing Archivematica Archival Information Packages (AIPs) and placing them in archival storage, &amp;lt;strike&amp;gt; and updating the Dataverse study with the AIP UUIDs &amp;lt;/strike&amp;gt; (this was determined to be out of scope). &lt;br /&gt;
*Analysis is based on Dataverse tests using [https://apitest.dataverse.org https://apitest.dataverse.org] and [https://demo.dataverse.org https://demo.dataverse.org], online documentation at http://guides.dataverse.org/en/latest/api/index.html and discussions with Dataverse developers and users. &lt;br /&gt;
*Proposed integration is for Archivematica 1.5 and higher and Dataverse 4.x.&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram===&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse - Archivematica workflow_1.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram notes===&lt;br /&gt;
&lt;br /&gt;
[1] &amp;quot;Ingest script&amp;quot; refers to an [https://github.com/artefactual/automation-tools automation tool] designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.&lt;br /&gt;
&lt;br /&gt;
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.&lt;br /&gt;
&lt;br /&gt;
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|Config setting in ingest tool&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future Requirements &amp;amp; Considerations ==&lt;br /&gt;
This section includes working notes for future phases, as interesting opportunities or questions arise. At the end of the current phase we will be documenting the integration as well as future opportunities. &lt;br /&gt;
&lt;br /&gt;
=== Notes from Feature File review meeting on May 1 2018 (2pm EST) ===&lt;br /&gt;
&lt;br /&gt;
'''Choice &amp;amp; Versioning of Dataverse API:''' &lt;br /&gt;
The dataverse Search and Access APIs are not currently versioned. &lt;br /&gt;
The Native API is versioned: http://guides.dataverse.org/en/latest/api/native-api.html&lt;br /&gt;
There is an OAI-PMH interface (although it is not mentioned in the dataverse API guide). Amber said there were idiosyncrasies in the way dataverse implemented PMH, and wasn’t sure it would be a ‘safe’ option. &lt;br /&gt;
Amaz would like to see that we are either using a standard API (like OAI-PMH) or a versioned API. &lt;br /&gt;
Amaz thought wondered whether we could use PMH with the polling part of the solution; but given what Amber said, it doesn’t seem like a good way to go)&lt;br /&gt;
So as part of the project we need to see whether we could use the Native API (even if we don’t actually use it), or we need to raise it as an issue to discuss with the dataverse team.   &lt;br /&gt;
&lt;br /&gt;
'''Relationships between Datasets'''&lt;br /&gt;
Amber pointed out that they are not currently clear exactly what datasets should be preserved, and expects this will vary quite a bit by institution. &lt;br /&gt;
We discussed the question of whether all datasets in a dataverse would be preserved (not currently known), which brought up the question of how to relate datasets. &lt;br /&gt;
We talked about AICs as one possible solution. But agreed that it’s a new feature and needs to be thought through… there could be other solutions than AIC. &lt;br /&gt;
&lt;br /&gt;
'''Improving agent info in event history in METS'''&lt;br /&gt;
We pointed out that having an agent other than Archivematica in the METS is a new feature&lt;br /&gt;
Discussed the fact that we could make this even more specific by adding more agents. For instance, differentiating between the researcher who uploaded files from the research data manager who published the dataset. &lt;br /&gt;
&lt;br /&gt;
'''Notes from Dataverse Testing:''' &lt;br /&gt;
&lt;br /&gt;
Should a preserved dataset include an equivalent of fixity check on any UNFs created by Dataverse? &lt;br /&gt;
https://dataverse.scholarsportal.info/guides/en/4.8.6/developers/unf/index.html#unf&lt;br /&gt;
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed.&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
	<entry>
		<id>https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12484</id>
		<title>Dataverse</title>
		<link rel="alternate" type="text/html" href="https://wiki.archivematica.org/index.php?title=Dataverse&amp;diff=12484"/>
		<updated>2018-05-15T19:19:43Z</updated>

		<summary type="html">&lt;p&gt;Joel-simpson: Added link to feature file&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Main Page]] &amp;gt; [[Documentation]] &amp;gt; [[Requirements]] &amp;gt; Dataverse&lt;br /&gt;
&lt;br /&gt;
This page sets out the requirements and designs for integration with [http://dataverse.org Dataverse]. &lt;br /&gt;
&lt;br /&gt;
This page was originally created as part of an early Proof of Concept integration in 2017, which was only made available in a development branch of Archivematica. We have now started a phase 2 project to improve on that original integration work and merge it into a public release of Archivematica (exact release tbc).  This work is being sponsored by [https://scholarsportal.info/ Scholars Portal], a service of the Ontario Council of University Libraries (OCUL). &lt;br /&gt;
&lt;br /&gt;
[[Category:Feature requirements]]&lt;br /&gt;
&lt;br /&gt;
===See also===&lt;br /&gt;
&lt;br /&gt;
* [[Sword API]]&lt;br /&gt;
* [[Dataset preservation]]&lt;br /&gt;
&lt;br /&gt;
==Overview==&lt;br /&gt;
This wiki captures requirements for ingesting studies (datasets) from Dataverse into Archivematica for long-term preservation.&lt;br /&gt;
&lt;br /&gt;
==Current Status==&lt;br /&gt;
&lt;br /&gt;
'''May 11, 2018'''&lt;br /&gt;
To see the current status of work, and any outstanding issue, please see the Waffle Board or Board's linked to [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse below]:&lt;br /&gt;
&lt;br /&gt;
* [https://waffle.io/artefactual/archivematica?label=OCUL:%20AM-Dataverse Waffle board for the Dataverse Feature]&lt;br /&gt;
&lt;br /&gt;
==Feature Files==&lt;br /&gt;
On this project we are using [http://docs.behat.org/en/v2.5/guides/1.gherkin.html Gherkin] feature files to define the desired behaviour of preserving a dataset from a Dataverse.  Feature files are also known as Acceptance Tests, because they specify the behaviour that we will test at the end of the project. &lt;br /&gt;
&lt;br /&gt;
The early drafts are documented in this google doc: [http://docs.behat.org/en/v2.5/guides/1.gherkin.html]&lt;br /&gt;
Once the draft has been reviewed we will publish it to our acceptance test repository in github. &lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
April 24, 2017&lt;br /&gt;
This feature requires a development branch of Archivematica, which can be installed with the following steps:&lt;br /&gt;
&lt;br /&gt;
1) install deploy-pub. https://github.com/artefactual/deploy-pub&lt;br /&gt;
2) use the archivematica-centos7 playbook in deploy-pub https://github.com/artefactual/deploy-pub/tree/master/playbooks/archivematica-centos7&lt;br /&gt;
3) create a hosts file that lists your target machine (see digital ocean example linked from playbook)&lt;br /&gt;
4) in requirements.yml change version of ansible-archivematica-src to &amp;quot;stable/1.6.x&amp;quot;&lt;br /&gt;
5) change singlenode.yml to point to the host you defined in your hosts file.&lt;br /&gt;
6) change the vars-singlenode.yml to include the following info:&lt;br /&gt;
#required for dataverse testing&lt;br /&gt;
archivematica_src_am_version: &amp;quot;dev/dataverse-poc&amp;quot;&lt;br /&gt;
archivematica_src_automationtools: &amp;quot;yes&amp;quot;&lt;br /&gt;
archivematica_src_automationtools_version: &amp;quot;dev/dataverse&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Workflow==&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
*The proposed workflow consists of issuing API calls to Dataverse, receiving content (data files and metadata) for ingest into Archivematica, preparing Archivematica Archival Information Packages (AIPs) and placing them in archival storage, &amp;lt;strike&amp;gt; and updating the Dataverse study with the AIP UUIDs &amp;lt;/strike&amp;gt; (this was determined to be out of scope). &lt;br /&gt;
*Analysis is based on Dataverse tests using [https://apitest.dataverse.org https://apitest.dataverse.org] and [https://demo.dataverse.org https://demo.dataverse.org], online documentation at http://guides.dataverse.org/en/latest/api/index.html and discussions with Dataverse developers and users. &lt;br /&gt;
*Proposed integration is for Archivematica 1.5 and higher and Dataverse 4.x.&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram===&lt;br /&gt;
This section is from the first phase project in 2017 and needs to be updated. &lt;br /&gt;
&lt;br /&gt;
[[File:Dataverse - Archivematica workflow_1.png|800px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
===Workflow diagram notes===&lt;br /&gt;
&lt;br /&gt;
[1] &amp;quot;Ingest script&amp;quot; refers to an [https://github.com/artefactual/automation-tools automation tool] designed to automate ingest into Archivematica for bulk processing. An existing automation tool would be modified to accomplish the tasks described in the workflow.&lt;br /&gt;
&lt;br /&gt;
[2] A new or updated study is one that has been published, either for the first time or as a new version, since the last API call.&lt;br /&gt;
&lt;br /&gt;
[3] The json file contains citation and other study-level metadata, an entity_id field that is used to identify the study in Dataverse, version information, a list of data files with their own entity_id values, and md5 checksums for each data file.&lt;br /&gt;
&lt;br /&gt;
[4] If json file has content_type of tab separated values, Archivematica issues API call for multiple file (&amp;quot;bundled&amp;quot;) content download. This returns a zipped package for tsv files containing the .tab file, the original uploaded file, several other derivative formats, a DDI XML file and file citations in Endnote and RIS formats.&lt;br /&gt;
&lt;br /&gt;
[5] The METS file will consist of a dmdSec containing the DC elements extracted from the json file, and a fileSec and structMap indicating the relationships between the files in the transfer (eg. original uploaded data file, derivative files generated for tabular data, metadata/citation files). This will allow Archivematica to apply appropriate preservation micro-services to different filetypes and provide an accurate representation of the study in the AIP METS file (step 1.9).&lt;br /&gt;
&lt;br /&gt;
[6] Archivematica ingests all content returned from Dataverse, including the json file, plus the METS file generated in step 1.6.&lt;br /&gt;
&lt;br /&gt;
[7] Standard and pre-configured micro-services include: assign UUID, verify checksums, generate checksums, extract packages, scan for viruses, clean up filenames, identify formats, validate formats, extract metadata and normalize for preservation.&lt;br /&gt;
&lt;br /&gt;
== Transfer METS file ==&lt;br /&gt;
&lt;br /&gt;
When the ingest script retrieves content from Dataverse, it generates a METS file to allow Archivematica to understand the contents of the transfer and the relationships between its various data and metadata files.&lt;br /&gt;
&lt;br /&gt;
=== Sample transfer METS file ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Original Dataverse study retrieved through API call:&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*dataset.json (a JSON file generated by Dataverse consisting of study-level metadata and information about data files)&lt;br /&gt;
*Study_info.pdf (a non-tabular data file)&lt;br /&gt;
*A zipped bundle consisting of the following:&lt;br /&gt;
**YVR_weather_data.sav (an SPSS SAV file uploaded by the researcher)&lt;br /&gt;
**YVR_weather_data.tab (a TAB file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR weather_data.RData (an R file generated from the SPSS SAV file by Dataverse)&lt;br /&gt;
**YVR_weather_data-ddi.xml, YVR_weather_datacitation-endnote.xml, and YVR_weather_datacitation-ris.ris (three metadata files generated for the TAB file by Dataverse)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Resulting transfer METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*The fileSec in the METS file consists of three file groups, USE=&amp;quot;original&amp;quot; (the PDF and SAV files); USE=&amp;quot;derivative&amp;quot; (the TAB and R files); and USE=&amp;quot;metadata&amp;quot; (the JSON file and the three metadata files from the zipped bundle).&lt;br /&gt;
*All of the files unpacked from the Dataverse bundle have a GROUPID attribute to indicate the relationship between them. If the transfer had consisted of more than one bundle, each set of unpacked files would have its own GROUPID.&lt;br /&gt;
*Three dmdSecs have been generated:&lt;br /&gt;
**dmdSec_1, consisting of a small number of study-level DDI terms&lt;br /&gt;
**dmdSec_2, consisting of an mdRef to the JSON file&lt;br /&gt;
**dmdSec_3, consisting of an mdRef to the DDI XML file&lt;br /&gt;
*In the structMap, dmdSec_1 and dmdSec_2 are linked to the study as a whole, while dmdSec_3 is linked to the TAB file. The endnote and ris files have not been made into dmdSecs because they contain small subsets of metadata which are already captured in dmdSec_1 and the DDI xml file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:METS1G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS2G.png|900px|thumb|center]]&lt;br /&gt;
[[File:METS3G.png|900px|thumb|center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata sources for METS file&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!style=&amp;quot;width:15%&amp;quot;|'''METS element'''&lt;br /&gt;
!style=&amp;quot;width:25%&amp;quot;|'''Information source'''&lt;br /&gt;
!style=&amp;quot;width:40%&amp;quot;|'''Notes'''&lt;br /&gt;
|-&lt;br /&gt;
|ddi:titl&lt;br /&gt;
|json: citation/typeName: &amp;quot;title&amp;quot;, value: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo&lt;br /&gt;
|json: authority, identifier&lt;br /&gt;
|json example: &amp;quot;authority&amp;quot;: &amp;quot;10.5072/FK2/&amp;quot;, &amp;quot;identifier&amp;quot;: &amp;quot;0MOPJM&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:IDNo agency attribute&lt;br /&gt;
|json: protocol&lt;br /&gt;
|json example: &amp;quot;protocol&amp;quot;: &amp;quot;doi&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|ddi:AuthEntity&lt;br /&gt;
|json: citation/typeName: &amp;quot;authorName&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:distrbtr&lt;br /&gt;
|Config setting in ingest tool&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version date attribute&lt;br /&gt;
|json: &amp;quot;releaseTime&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version type attribute&lt;br /&gt;
|json: &amp;quot;versionState&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:version&lt;br /&gt;
|json: &amp;quot;versionNumber&amp;quot;, &amp;quot;versionMinorNumber&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|ddi:restrctn&lt;br /&gt;
|json: &amp;quot;termsOfUse&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;original&amp;quot;&lt;br /&gt;
|json: datafile&lt;br /&gt;
|Each non-tabular data file is listed as a datafile in the files section. Each TAB file derived by Dataverse for uploaded tabular file formats is also listed as a datafile, with the original file uploaded by the researcher indicated by &amp;quot;originalFileFormat&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
|All files that are included in a bundle, except for the original file and the metadata files (see below).&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
|Any files with .json or .ris extension, any -ddi.xml files and -endnote.xml files&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUM&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;: [value]&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|CHECKSUMTYPE&lt;br /&gt;
|json: datafile/&amp;quot;md5&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|GROUPID&lt;br /&gt;
|Generated by ingest tool. Each file unpacked from a bundle is given the same group id.&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== AIP METS file ==&lt;br /&gt;
&lt;br /&gt;
=== Basic METS file structure ===&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) METS file will follow the basic structure for a standard Archivematica AIP METS file described at [[METS]]. A new fileGrp USE=&amp;quot;derivative&amp;quot; will be added to indicate TAB, RData and other derivatives generated by Dataverse for uploaded tabular data format files.&lt;br /&gt;
&lt;br /&gt;
=== dmdSecs in AIP METS file ===&lt;br /&gt;
&lt;br /&gt;
The dmdSecs in the transfer METS file will be copied over to the AIP METS file.&lt;br /&gt;
&lt;br /&gt;
=== Additions to PREMIS for derivative files ===&lt;br /&gt;
&lt;br /&gt;
In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse will be described using PREMIS relationship semantic units. A PREMIS derivation event will be added to indicate the derivative file was generated from the original file, and a Dataverse Agent will be added to indicate the Event were carried out by Dataverse prior to ingest, rather than by Archivematica. &lt;br /&gt;
&lt;br /&gt;
'''Note''' We originally considered adding a creation event for the derivative files as well, but decided that it's not necessary as the event can be inferred from the derivation event and the PREMIS object relationships.&lt;br /&gt;
&lt;br /&gt;
'''Note''' &amp;quot;Derivation&amp;quot; is not an event type on the Library of Congress controlled vocabulary list at http://id.loc.gov/vocabulary/preservation/eventType.html. However, we have submitted it as a proposed new term (November 2015) at http://premisimplementers.pbworks.com/w/page/102413902/Preservation%20Events%20Controlled%20Vocabulary - a list of new terms that is being considered by the PREMIS Editorial Committee.&lt;br /&gt;
&lt;br /&gt;
'''Update''' ''April 2018'': The most recently available Event Type Controlled List (June 2017) does not yet have derivation as a controlled type, https://www.loc.gov/standards/premis/v3/preservation-events.pdf&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
Original SPSS SAV file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;is source of&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[TAB file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;derivation&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;URI&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;premis:agentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierType&amp;gt;URI&amp;lt;/premis:agentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:agentIdentifierValue&amp;gt;http://dataverse.scholarsportal.info/dvn/&amp;lt;/premis:agentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:agentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentName&amp;gt;SP Dataverse Network&amp;lt;/premis:agentName&amp;gt;&lt;br /&gt;
&amp;lt;premis:agentType&amp;gt;organization&amp;lt;/premis:agentType&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Derivative TAB file&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relationshipType&amp;gt;derivation&amp;lt;/premis:relationshipType&amp;gt;&lt;br /&gt;
    &amp;lt;premis:relationshipSubType&amp;gt;has source&amp;lt;/premis:relationshipSubType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentification&amp;gt;                  &lt;br /&gt;
    &amp;lt;premis:relatedObjectIdentifierType&amp;gt;UUID&amp;lt;/premis:relatedObjectIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:relatedObjectIdentifierValue&amp;gt;[SPSS SAV file UUID]&amp;lt;/premis:relatedObjectIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;premis:relationship&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Fixity check for checksums received from Dataverse ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierType&amp;gt;UUID&amp;lt;/premis:eventIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventIdentifierValue&amp;gt;[Event UUID assigned by Archivematica]&amp;lt;/premis:eventIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:eventIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventType&amp;gt;fixity check&amp;lt;/premis:eventType&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDateTime&amp;gt;2015-08-21&amp;lt;/premis:eventDateTime&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventDetail&amp;gt;program=&amp;quot;python&amp;quot;; module=&amp;quot;hashlib.sha256()&amp;quot;&amp;lt;/premis:eventDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcome&amp;gt;Pass&amp;lt;/premis:EventOutcome&amp;gt;&lt;br /&gt;
  &amp;lt;premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
    &amp;lt;premis:eventOutcomeDetailNote&amp;gt;Dataverse checksum 91b65277959ec273763d28ef002e83a6b3fba57c7a3[...] &lt;br /&gt;
verified&amp;lt;/premis:eventOutcomeDetailNote&amp;gt;&lt;br /&gt;
  &amp;lt;/premis:eventOutcomeDetail&amp;gt;&lt;br /&gt;
&amp;lt;premis:eventOutcomeInformation&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierType&amp;gt;preservation system&amp;lt;/premis:linkingAgentIdentifierType&amp;gt;&lt;br /&gt;
  &amp;lt;premis:linkingAgentIdentifierValue&amp;gt;Archivematica 1.4.1&amp;lt;/premis:linkingAgentIdentifierValue&amp;gt;&lt;br /&gt;
&amp;lt;/premis:linkingAgentIdentifier&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== AIP structure ==&lt;br /&gt;
&lt;br /&gt;
An Archival Information Package derived from a Dataverse ingest will have the same basic structure as a generic Archivematica AIP, described at [[AIP_structure]]. There are additional metadata files that are included in a Dataverse-derived AIP, and each zipped bundle that is included in the ingest will result in a separate directory in the AIP. The following is a sample structure.&lt;br /&gt;
&lt;br /&gt;
'''Bag structure'''&lt;br /&gt;
&lt;br /&gt;
The Archival Information Package (AIP) is packaged in the Library of Congress BagIt format, and may be stored compressed or uncompressed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pacific_weather_patterns_study-dfb0b75d-6555-4e99-a8d8-95bed0f6303f.7z&lt;br /&gt;
├── bag-info.txt&lt;br /&gt;
├── bagit.txt &lt;br /&gt;
├── manifest-sha512.txt│   &lt;br /&gt;
├── tagmanifest-md5.txt&lt;br /&gt;
└── data [standard bag directory containing contents of the AIP]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP structure'''&lt;br /&gt;
&lt;br /&gt;
All of the contents of the AIP reside within the data directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
├── data&lt;br /&gt;
│   ├── logs [log files generated during processing]&lt;br /&gt;
│   │   ├── fileFormatIdentification.log&lt;br /&gt;
│   │   └── transfers&lt;br /&gt;
│   │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│   │           └── logs&lt;br /&gt;
│   │               ├── extractContents.log&lt;br /&gt;
│   │               ├── fileFormatIdentification.log&lt;br /&gt;
│   │               └── filenameCleanup.log&lt;br /&gt;
│   ├── METS.dfb0b75d-6555-4e99-a8d8-95bed0f6303f.xml [the AIP METS file]&lt;br /&gt;
│   ├── objects [a directory containing the digital objects being preserved, plus their metadata]&lt;br /&gt;
│       ├── chelan_052.jpg [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data.sav [an original file from Dataverse]&lt;br /&gt;
│       ├── Weather_data [a bundle retrieved from Dataverse]&lt;br /&gt;
│       │   ├── Weather_data.xml&lt;br /&gt;
│       │   ├── Weather_data.ris&lt;br /&gt;
│       │   ├── Weather_data-ddi.xml&lt;br /&gt;
│       │   └── Weather_data.tab [a TAB derivative file generated by Dataverse]&lt;br /&gt;
│       ├── metadata&lt;br /&gt;
│       │   └── transfers&lt;br /&gt;
│       │       └── Pacific_weather_patterns_study-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│       │           ├── agents.json [information about the source of the data, used to populate the &lt;br /&gt;
PREMIS Dataverse agent in the AIP METS file]&lt;br /&gt;
│       │           ├── dataset.json [the full json file retrieved from Dataverse]&lt;br /&gt;
│       │           └── METS.xml [the METS file generated by the ingest script to prepare &lt;br /&gt;
Dataverse contents for ingest into Archivematica]&lt;br /&gt;
│       └── submissionDocumentation&lt;br /&gt;
│           └── transfer-58-1a0f309a-d3ec-43ee-bb48-a868cd5ca85c&lt;br /&gt;
│               └── METS.xml [a standard transfer METS file generated to list all contents of &lt;br /&gt;
an Archivematica transfer]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''AIP METS file structure'''&lt;br /&gt;
&lt;br /&gt;
The AIP METS file records information a bout the contents of the AIP, and indicates the relationships between the various files in the AIP. A sample AIP METS file would be structured as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
METS header&lt;br /&gt;
-Date METS file was created&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-DDI XML metadata taken from the METS transfer file, as follows&lt;br /&gt;
--ddi:title&lt;br /&gt;
--ddi:IDno&lt;br /&gt;
--ddi:authEnty&lt;br /&gt;
--ddi:distrbtr&lt;br /&gt;
--ddi:version&lt;br /&gt;
--ddi:restrctn&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to dataset.json&lt;br /&gt;
METS dmdSec [descriptive metadata section]&lt;br /&gt;
-link to DDI.XML file created for derivative file as part of bundle&lt;br /&gt;
METS amdSec [administrative metadata section, one for each original, derivative and normalized file in the AIP]&lt;br /&gt;
-techMD [technical metadata]&lt;br /&gt;
--PREMIS technical metadata about a digital object, including file format information and extracted metadata&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: derivation (for derived formats)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event:ingestion&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: unpacking (for bundled files)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: message digest calculation&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: virus check&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: format identification&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: fixity check (if file comes from Dataverse with a checksum)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: normalization (if file is normalized to a preservation format during Archivematica processing)&lt;br /&gt;
-digiprovMD [digital provenance metadata]&lt;br /&gt;
--PREMIS event: creation (if file is a normalized preservation master generated during Archivematica processing)&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: organization&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: software&lt;br /&gt;
-digiprovMD&lt;br /&gt;
--PREMIS agent: Archivematica user&lt;br /&gt;
METS fileSec [file section]&lt;br /&gt;
-fileGrp USE=&amp;quot;original&amp;quot; [file group]&lt;br /&gt;
--original files uploaded to Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;derivative&amp;quot;&lt;br /&gt;
--derivative tabular files generated by Dataverse&lt;br /&gt;
-fileGrp USE=&amp;quot;submissionDocumentation&amp;quot;&lt;br /&gt;
--METS.XML (standard Archivematica transfer METS file listing contents of transfer)&lt;br /&gt;
-fileGrp USE=&amp;quot;preservation&amp;quot;&lt;br /&gt;
--normalized preservation masters generated during Archivematica processing&lt;br /&gt;
-fileGrp USE=&amp;quot;metadata&amp;quot;&lt;br /&gt;
--dataset.json&lt;br /&gt;
--DDI.XML&lt;br /&gt;
--xcitation-endnote.xml&lt;br /&gt;
--xcitation-ris.ris&lt;br /&gt;
METS structMap [structural map]&lt;br /&gt;
-directory structure of the contents of the AIP&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Joel-simpson</name></author>
	</entry>
</feed>