Difference between revisions of "Improvements/Islandora"

From Archivematica
Jump to navigation Jump to search
m (Move to feature requirements category)
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Sections of this page have been copied and adapted from the Islandora Foundation's Archidora documentation under a [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 Unported License]. These sections are appended by a hyperlink to the original content in the Islandora wiki. We are grateful to Islandora for both writing and sharing this documentation.
+
Sections of this page have been copied and adapted from the Islandora Foundation's [https://wiki.duraspace.org/display/ISLANDORA/Archidora Archidora documentation] under a [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 Unported License]. These sections are appended by a hyperlink to the original content in the Islandora wiki. We are grateful to the Islandora Foundation for both writing and sharing this documentation.
 +
 
 +
[[Category:Feature requirements]]
  
 
== Synopsis ==
 
== Synopsis ==
Line 7: Line 9:
 
== User story ==
 
== User story ==
  
The goal of the Archidora module is to allow Islandora users to seamlessly preserve content that is ingested into Islandora using Archivematica's suite of digital preservation micro-services, creating preservation copies of that content for long-term storage.
+
The goal of the Archidora module is to allow Islandora users to seamlessly preserve content that is ingested into Islandora using Archivematica's suite of digital preservation micro-services, creating preservation copies of that content for long-term storage. The Islandora user ingesting the content should not be required to mediate the transfer to Archivematica in any way. Upon completion of the transfer and ingest into Archivematica, a notification is sent back to Islandora indicating that the storage was successful.
  
[[File:Example.jpg]]
+
[[File:Archidora-1.png|500px]]
  
 
== Status ==
 
== Status ==
  
== Configuration ==
+
The Archidora module was developed in 2014 and has been deployed at the University of Saskatchewan Library since 2015. Testing is ongoing.
  
=== Download ===
+
The code is currently held [https://github.com/Islandora-Labs/archidora in Github by the Islandora Foundation], but is not being actively maintained.
  
=== Install ===
+
== Analysis ==
 +
 
 +
Building on the basic workflow described in the [User story], above, the following detailed workflow was developed to describe how content is ingested from Islandora into Archivematica and stored.
 +
 
 +
# Content Upload
 +
## Content is ingested into Islandora
 +
## Drupal cron and Fedora content validation are used to trigger content upload to Archivematica
 +
## One upload is created per Fedora Object
 +
## Metadata includes an AIP ID generated either from user input or based on the collection metadata
 +
## METS.xml is posted to the Archivematica URI
 +
# Create Transfer
 +
## Archivematica parses and validates the METS.xml
 +
## If the METS is valid, Archivematica sends either a 201 (Created) or 412 (Precondition failed) response to Islandora; Islandora saves the EM-IRI from the response and uses it to notify the user if the transfer was successful
 +
## Archivematica identifies which files to request from Islandora
 +
## The transfer object is created
 +
## The file URIs are passed to Archivematica for asynchronous processing
 +
# Collect Files
 +
## Files are retrieved from the Fedora REST API using HTTP GET and added to the transfer; if there is an error, an error response is returned
 +
## Checksums are confirmed
 +
# Transfer and Ingest
 +
## Files are moved to the watchedDirectory
 +
## Transfer and Ingest are completed in Archivematica, either manually or automatically
 +
## A blank HTTP POST is sent to SE-IRI; if the HTTP POST is false, the transfer or ingest are still in progress
 +
# Check Archivematica status
 +
## Islandora uses GET statement IRI to request status of transfer
 +
# Status Response
 +
## If the ingest is successful, Archivematica sends either a 201 (Created) or 412 (Precondition failed) response to Islandora, letting Islandora know that the last object in the AIP has been uploaded
 +
# Content deleted from Islandora
 +
## The Hi-Res datastream is deleted from Islandora (configurable), preserving only the access copy/ies of the content
 +
## Islandora sends an HTTP POST to EM-IRI to indicate that the content has been deleted
 +
# Log Islandora deletion
 +
## Archivematica marks objects as deleted from the access system
 +
## Search index is updated
 +
## Archivematica sends either a 200 (OK) or 400 (error) response to Islandora
 +
 
 +
[[File:Archidora-2.png|500 px]]
 +
 
 +
== Download ==
 +
 
 +
* Islandora module: https://github.com/discoverygarden/archidora
 +
* Archivematica: Archivematica 1.4 and Storage Service 0.7 or later is required; download from http://www.archivematica.org.
 +
* This integration is currently considered a beta feature.
 +
 
 +
- ''[https://wiki.duraspace.org/display/ISLANDORA/Archidora Islandora Archidora documentation]''
 +
 
 +
== Install ==
 +
 
 +
Installation and testing is similar to any Drupal module. Please see [https://wiki.duraspace.org/display/ISLANDORA715/Installing+the+Islandora+Enhancement+Modules Installing the Islandora Enhancement Modules] for details.
 +
 
 +
- ''[https://wiki.duraspace.org/display/ISLANDORA/Archidora Islandora Archidora documentation]''
 +
 
 +
== Configure ==
 +
 
 +
=== In the Archivematica Storage Space: ===
 +
 
 +
* Create a Space with access protocol FEDORA via SWORD2.
 +
* Create a Location within that Space (purpose = FEDORA deposits)
 +
* Enter the Fedora URL, username and password.
 +
** See [https://www.archivematica.org/en/docs/storage-service-0.7/administrators/#fedora Archivematica Storage Service documentation] for more details.
 +
* Archivematica may also be configured to call back to Islandora to delete the high-res "OBJ" datastreams - this is done in the Storage Service > Administration > Service callbacks
 +
** URI: http://islandora-base-url/islandora/object/<source_id>/archidora/{Islandora API}/delete
 +
*** Where the API key is the "Islandora Archivematica integration API key" listed/generated on the Archidora admin screen
 +
** Event: post-store
 +
** Method: post
 +
** Expected status: 200
 +
 
 +
* Note: the OBJ datastreams are not deleted automatically, but rather are listed at the collection level (or compound object level) on the Manage | Archivematica tab. They can be deleted individually or in bulk.
 +
 
 +
=== On the Archivematica dashboard: ===
 +
 
 +
* On the administration tab, add the IP address of the storage service to the IP whitelist for the REST API
 +
 
 +
=== In Islandora ===
 +
 
 +
* Enable cron; configure it to run at reasonably frequent intervals (e.g. every five minutes), otherwise the expected callbacks may not be triggered often enough.
 +
 
 +
* Configure Archidora at admin/islandora/archidora.
 +
** Archivematica Storage Service Base URL - normally http://archivematica-url:8000
 +
** Deposit Location - will be configured automatically once storage service URL is entered
 +
** Archivematica User - Archivematica dashboard user to be used for Islandora integration (not storage service)
 +
** Archivematica API Key - API key for the Archivematica dashboard user listed above
 +
** EM-IRI Solr field - used for constructing Sword API call (default is "RELS_EXT_edit_media_uri_ms")
 +
** AIP max age - new objects will not be added to a deposit after the specified time has elapsed
 +
** AIP max size - new objects will not be added to a deposit after the specified size has been reached. Note that this is really the transfer size; the ** AIP could be larger due to normalized objects
 +
** Cron time - the amount of time for which the queue of items will be allowed to process, at each cron invocation. Setting a higher time is recommended if compound objects are being ingested (especially manually), otherwise the relationships may not be included in the METS file sent to Archivematica
 +
 
 +
* You may also need to add a rule to the firewall on the Fedora server to allow access from the Archivematica storage service (e.g. to port 8080)
 +
 
 +
* Collection-level configuration:
 +
** Check off "Don't Archive Children" to stop objects from being sent to Archivematica for a particular collection.
 +
 
 +
- ''[https://wiki.duraspace.org/display/ISLANDORA/Archidora Islandora Archidora documentation]''
 +
 
 +
== Scope ==
 +
 
 +
Archidora is still considered a beta feature. As such, further development is likely required to bring it to stability.
 +
 
 +
Other proposed improvements include the following:
 +
 
 +
* Ingest Premis from Islandora into Archivematica
 +
* Ingest other metadata (e.g., DDI)
 +
* Ingest Bags
 +
* Format Policy Registry Integration
 +
* Asynchronous derivative generation
 +
* Integrate with checksum checker
 +
* Provide more information back to Islandora (e.g., aip url)
 +
* Support other workflows (e.g. upload from Archivematica to Islandora)
 +
* Support Fedora for AIP Storage
 +
* Improve reporting/logging in Archidora
 +
 
 +
No estimates have been prepared for the above.
  
 
== Interest ==
 
== Interest ==
  
== Analysis ==
+
Please feel free to add your or your organization's name and any comments to this section if you have an interest in improving this module.
 +
 
 +
Artefactual would like to see active development on Archidora. We are able to do the development work, for a fee. We are also willing to assist others to complete all or part of the work required.
 +
 
 +
Interested in talking to us about sponsoring development of Archidora? Get in touch with Artefactual at info@artefactual.org or with Discovery Garden at info@discoverygarden.ca.
  
== Scope ==
+
Interested in contributing code to the Archidora project? [https://github.com/Islandora-Labs/archidora Create a pull request]!

Latest revision as of 13:49, 23 March 2017

Sections of this page have been copied and adapted from the Islandora Foundation's Archidora documentation under a Creative Commons Attribution-Share Alike 3.0 Unported License. These sections are appended by a hyperlink to the original content in the Islandora wiki. We are grateful to the Islandora Foundation for both writing and sharing this documentation.

Synopsis[edit]

Archidora is a module that integrates the digital preservation functionality of Archivematica with Islandora. It was developed by Artefactual Systems and Discovery Garden, sponsored by the University of Saskatchewan Library.

User story[edit]

The goal of the Archidora module is to allow Islandora users to seamlessly preserve content that is ingested into Islandora using Archivematica's suite of digital preservation micro-services, creating preservation copies of that content for long-term storage. The Islandora user ingesting the content should not be required to mediate the transfer to Archivematica in any way. Upon completion of the transfer and ingest into Archivematica, a notification is sent back to Islandora indicating that the storage was successful.

Archidora-1.png

Status[edit]

The Archidora module was developed in 2014 and has been deployed at the University of Saskatchewan Library since 2015. Testing is ongoing.

The code is currently held in Github by the Islandora Foundation, but is not being actively maintained.

Analysis[edit]

Building on the basic workflow described in the [User story], above, the following detailed workflow was developed to describe how content is ingested from Islandora into Archivematica and stored.

  1. Content Upload
    1. Content is ingested into Islandora
    2. Drupal cron and Fedora content validation are used to trigger content upload to Archivematica
    3. One upload is created per Fedora Object
    4. Metadata includes an AIP ID generated either from user input or based on the collection metadata
    5. METS.xml is posted to the Archivematica URI
  2. Create Transfer
    1. Archivematica parses and validates the METS.xml
    2. If the METS is valid, Archivematica sends either a 201 (Created) or 412 (Precondition failed) response to Islandora; Islandora saves the EM-IRI from the response and uses it to notify the user if the transfer was successful
    3. Archivematica identifies which files to request from Islandora
    4. The transfer object is created
    5. The file URIs are passed to Archivematica for asynchronous processing
  3. Collect Files
    1. Files are retrieved from the Fedora REST API using HTTP GET and added to the transfer; if there is an error, an error response is returned
    2. Checksums are confirmed
  4. Transfer and Ingest
    1. Files are moved to the watchedDirectory
    2. Transfer and Ingest are completed in Archivematica, either manually or automatically
    3. A blank HTTP POST is sent to SE-IRI; if the HTTP POST is false, the transfer or ingest are still in progress
  5. Check Archivematica status
    1. Islandora uses GET statement IRI to request status of transfer
  6. Status Response
    1. If the ingest is successful, Archivematica sends either a 201 (Created) or 412 (Precondition failed) response to Islandora, letting Islandora know that the last object in the AIP has been uploaded
  7. Content deleted from Islandora
    1. The Hi-Res datastream is deleted from Islandora (configurable), preserving only the access copy/ies of the content
    2. Islandora sends an HTTP POST to EM-IRI to indicate that the content has been deleted
  8. Log Islandora deletion
    1. Archivematica marks objects as deleted from the access system
    2. Search index is updated
    3. Archivematica sends either a 200 (OK) or 400 (error) response to Islandora

Archidora-2.png

Download[edit]

- Islandora Archidora documentation

Install[edit]

Installation and testing is similar to any Drupal module. Please see Installing the Islandora Enhancement Modules for details.

- Islandora Archidora documentation

Configure[edit]

In the Archivematica Storage Space:[edit]

  • Create a Space with access protocol FEDORA via SWORD2.
  • Create a Location within that Space (purpose = FEDORA deposits)
  • Enter the Fedora URL, username and password.
  • Archivematica may also be configured to call back to Islandora to delete the high-res "OBJ" datastreams - this is done in the Storage Service > Administration > Service callbacks
    • URI: http://islandora-base-url/islandora/object/<source_id>/archidora/{Islandora API}/delete
      • Where the API key is the "Islandora Archivematica integration API key" listed/generated on the Archidora admin screen
    • Event: post-store
    • Method: post
    • Expected status: 200
  • Note: the OBJ datastreams are not deleted automatically, but rather are listed at the collection level (or compound object level) on the Manage | Archivematica tab. They can be deleted individually or in bulk.

On the Archivematica dashboard:[edit]

  • On the administration tab, add the IP address of the storage service to the IP whitelist for the REST API

In Islandora[edit]

  • Enable cron; configure it to run at reasonably frequent intervals (e.g. every five minutes), otherwise the expected callbacks may not be triggered often enough.
  • Configure Archidora at admin/islandora/archidora.
    • Archivematica Storage Service Base URL - normally http://archivematica-url:8000
    • Deposit Location - will be configured automatically once storage service URL is entered
    • Archivematica User - Archivematica dashboard user to be used for Islandora integration (not storage service)
    • Archivematica API Key - API key for the Archivematica dashboard user listed above
    • EM-IRI Solr field - used for constructing Sword API call (default is "RELS_EXT_edit_media_uri_ms")
    • AIP max age - new objects will not be added to a deposit after the specified time has elapsed
    • AIP max size - new objects will not be added to a deposit after the specified size has been reached. Note that this is really the transfer size; the ** AIP could be larger due to normalized objects
    • Cron time - the amount of time for which the queue of items will be allowed to process, at each cron invocation. Setting a higher time is recommended if compound objects are being ingested (especially manually), otherwise the relationships may not be included in the METS file sent to Archivematica
  • You may also need to add a rule to the firewall on the Fedora server to allow access from the Archivematica storage service (e.g. to port 8080)
  • Collection-level configuration:
    • Check off "Don't Archive Children" to stop objects from being sent to Archivematica for a particular collection.

- Islandora Archidora documentation

Scope[edit]

Archidora is still considered a beta feature. As such, further development is likely required to bring it to stability.

Other proposed improvements include the following:

  • Ingest Premis from Islandora into Archivematica
  • Ingest other metadata (e.g., DDI)
  • Ingest Bags
  • Format Policy Registry Integration
  • Asynchronous derivative generation
  • Integrate with checksum checker
  • Provide more information back to Islandora (e.g., aip url)
  • Support other workflows (e.g. upload from Archivematica to Islandora)
  • Support Fedora for AIP Storage
  • Improve reporting/logging in Archidora

No estimates have been prepared for the above.

Interest[edit]

Please feel free to add your or your organization's name and any comments to this section if you have an interest in improving this module.

Artefactual would like to see active development on Archidora. We are able to do the development work, for a fee. We are also willing to assist others to complete all or part of the work required.

Interested in talking to us about sponsoring development of Archidora? Get in touch with Artefactual at info@artefactual.org or with Discovery Garden at info@discoverygarden.ca.

Interested in contributing code to the Archidora project? Create a pull request!