Enhance DIP Dissemination Workflows

From Archivematica
Jump to navigation Jump to search


Synopsis[edit]

Archivematica can produce Dissemination Information Packages (DIPs) designed to be ingested into a few different archival description and access systems (see documentation here.

The current DIP workflows implemented in Archivematica produce DIP's in parallel with AIP's, and upload to the access system (typically AtoM, with some support for ContentDM, Archivists Toolkit and ArchivesSpace) is performed during initial processing.

Proposal[edit]

Rather than producing the DIP during the AIP creation workflow, DIPs could be generated after the AIP creation workflow, by downloading the AIP from the Archivematica Storage Service, transforming the AIP into the required DIP structure, and then uploading to AtoM, or disseminating via another method.

Advantages[edit]

This approach has a number of advantages:

  • It is not possible to upload a DIP when the corresponding AIP has a failure (a problem with the current approach)
  • The development work of generating custom DIP formats is not tied to public Archivematica releases, and does not require any changes to Archivematica or the Storage Service.
  • Previously existing AIPs can have DIPs generated without requiring re-ingest in Archivematica.
  • It is possible to use the original hierarchical structure of the AIP, in the DIP (as opposed to current method where the original folder structure is flattened)
  • It is possible to make other transformations to DIP contents (e.g., restore original file names)
  • The approach opens the door to possible future enhancements, such as the ability to create a DIP from multiple AIPs, or the ability to define custom structures for DIPs required in specific circumstances
  • The approach does not require the use of any particular access system. The DIPs produced can be uploaded to

Limitations[edit]

Limitations of this approach include:

  • Not currently possible to include access derivatives in a DIP produced after the AIP creation workflow. This is not a limitation in all cases (e.g. research data, where often there is no access format defined). The ability to generate access derivatives after the AIP creation workflow is a desired feature, which could be supported with enhancements to the Format Policy Registry.
  • Not clear yet how to document the requirements for a particular DIP format
  • No record in Archivematica Dashboard or Storage Service will be kept for the creation or dissemination of the DIP (not much different in current workflow).

Workflow[edit]

The Proposed DIP dissemination workflow can be described at a high level as:

  1. Produce AIP (containing original objects, preservation derivatives, and metadata in METS and possibly other formats)
  2. Download AIP (from the Storage Service via the REST API)
  3. Alter physical structure of contents as required (e.g., remove preservation derivatives, remove logs)
  4. Read metadata from AIP and transform in format required for DIP (e.g., create a new DIP METS file)
  5. Package DIP as required ( e.g., create a new bag, and/or zip file)
  6. Transfer DIP to access system (e.g., POST to Atom Sword endpoint)

Possible Implementation[edit]

A basic proof of concept of this workflow could be developed by producing two new stand alone scripts. Assuming that the access system is AtoM, and that the DIP should contain only original files (not access derivatives) simplifies the work required in an initial iteration.

  1. DIP Creation Script

Alter the structure of a supplied AIP to turn it into the required DIP (based on a pre-defined required DIP structure and metadata extracted from the AIP METS file). This would require defining the desired physical contents of the DIP, and the desired metadata serialization.

  1. DIP Upload Script

Based on the current DIP upload to AtoM script in Archivematica, the only changes required would be to simplify the script so it can be run outside of an Archivematica processing pipeline. There is also cli support for DIP upload in AtoM that could be used.

DIP Structure Requirements[edit]

There are a number of use cases where a DIP's structure needs to differ from the original AIP's structure.

One use case is CAD files. A CAD file can contain links to a number of other files (fonts, images, etc) that need to be present for the CAD file to render properly. It may not be desirable to produce a DIP with all the files listed. In AtoM, for example, this would result in separate archival descriptions being generated for each font file. An alternative is to produce a single zip file, containing the CAD file and all required linked files, and presenting this to AtoM as a single entity. Only one archival description is generated in AtoM, and a user can download the zip file, extract it and have a full working CAD file (assuming they have software available to open it).