PREMIS/METS for scalability

From Archivematica
< PREMIS
Revision as of 13:57, 20 February 2018 by Evelyn McLellan (talk | contribs)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Metadata elements

The scalability issue

SIPs with very large numbers of files (10,000 or more) tend to create very large METS files. This can cause workflow failures and problems with indexing, storing and parsing the metadata. The issue is how to reduce the verbosity of the PREMIS and METS elements without removing information required for preservation.

An approach to the problem

Currently, a standard Archivematica METS file is based entirely on the description of PREMIS Files. Each File has its own METS amdSec, containing the following: a techMD; multiple Events, each in its own digiprovMD; and three Agents, each in its own digiprovMD. This means that a single File referenced in the METS fileSec has one linked amdSec.

We propose using a premis Representation as a level at which to capture information about Events that are common to all Files, and a single amdSec with premis Agents for the entire AIP. This is more feasbile with the introduction of PREMIS 3.0, which provides a means for creating an Intellectual Entity and one or more Representations of that Entity. This is a diagram depicting the proposed approach:

Reduced METS.png