Large datasets

From Archivematica
Revision as of 16:55, 11 February 2020 by Sallain (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Large datasets

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

What happens when a body of materials to be ingested consists of thousands of files (eg a large social science research dataset), or when one file is extremely large (eg an HD video file)?

  • The large number of files could be broken up and distributed across multiple AIPs, with relationships between them expressed in the METS structMaps.
    • The dataset could be broken into a parent AIP which acts as an Archival Information Collection, consisting entirely of a METS structMap listing all its child AIPs; each child AIP would have a link back to the parent AIP in its own structMap.
  • The large single file could be broken into multiple segments, each in its own AIP. Video files could be delivered to end users in these segments, the way large video files are delivered on Youtube, for example.
    • Other types of large files might have to be merged back into one for delivery to a user.