Overview

From Archivematica
Revision as of 12:01, 30 April 2010 by Peter (talk | contribs)
Jump to navigation Jump to search

Main Page > Documentation > Overview

Open Source OAIS

The Archivematica project is integrating a number of open-source tools to create a comprehensive digital archives system that is compliant with the ISO-OAIS functional model and other digital preservation standards and best practices. All of the Archivematica code and documentation is re-released under GPL and Creative Commons open-source licenses.

Virtualization technology

Using the latest in virtualization technology, each release of the system packages a customized Xubuntu environment as a virtual appliance, making it possible to run on top of any consumer-grade hardware and operating system, or even directly from a USB key. This means an entire suite of digital preservation tools is now available to the average archivist from one simple installation. At the same time, the Archivematica architecture allows it to be componentized and installed directly on dedicated hardware in a distributed, enterprise architecture to support large-scale, resource-intensive production environments.

Receiving files for ingest

Archivematica provides a template to create Submission Information Package (SIP) profiles based on qualified Dublin Core and METS elements. However, the system will accept files for ingest with as much or as little metadata as is available. It runs the SIP through a series of ingest processes including unpacking, checksum verification and creation, unique identification, quarantine, format identification, format validation, metadata extraction and normalization. A variety of tools are used in each of these processes, including Easy Extract, Detox, UUID, CLAM AV, Thunar, Incron, Flock, JHOVE, DROID, NLNZ Metadata Extractor, File, FFident, File Information Tool Set (FITS), Xena, OpenOffice, Unoconv, FFmpeg, ImageMagick, and Inkscape. The web-based Archivematica Dashboard monitors the progress of each SIP, logs the results of each process, reports on any errors and prompts the archivist to trigger subsequent processes.

Media type preservation plans

Archivematica maintains the original format of all ingested files to support migration and emulation preservation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica assigns each file format to a media type preservation plan (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards; additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant properties for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.

Preparing files for archival storage

Archivematica packages Archival Information Packages (AIPs) using qualified Dublin Core, PREMIS and METS elements and Library of Congress’ Bagit format. Archivematica is able to interact with any number of storage systems using standard protocols (NFS, CIFS, HTTP, etc.) to allow for the flexible implementation of an archival storage and backup strategy. Archival storage options range from local hard disk, external hard disks, network attached storage devices, LOCKSS networks (e.g. MetaArchives, COPPUL), storage grids (e.g. iRODS, Bycast), cloud storage (e.g. Amazon S3, Microsoft Azure), etc..

Making files available for access

Archivematica prepares default Dissemination Information Packages (DIP) which are based on the designated access formats for each media type. Consumers can subsequently request AIP copies but caching access copies is a much more scalable approach that will address the majority of access requests in the most performant manner (i.e. reducing the bandwidth and time required to retrieve AIPs from archival storage and uploading them to the Consumer). The DIP access derivatives are sent via a REST interface to a web-based application such as ICA-AtoM or Archon for further enhancement of descriptive metadata (using ISAD(G), EAD, DACS, etc). These can then be arranged as accruals into existing archival descriptions to provide search and browse access to the institution’s analogue and digital holdings from one common web-based interface. The Archivematica Dashboard manages the read and write operations of the AIP to file storage and also coordinates the syncing of metadata updates between the AIPs and the access system.

Lowering the barriers to best-practice digital preservation

The goal of the Archivematica project is to give archivists with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today. The project has conducted a thorough OAIS use case and process analysis to synthesize the specific, concrete steps that must be carried out to comply with the OAIS functional model from Ingest to Access. Wherever possible, these steps are assigned to software tools within the Archivematica system. If it is not possible to automate these steps in the current system iteration, they are incorporated and documented into a manual procedure to be carried out by the end user. This ensures that the entire set of preservation requirements is being carried out, even in the very early iterations of the system. In short, the system is conceptualized as an integrated whole of technology, people and procedures, not just a set of software tools.

All of the software, documentation and development infrastructure are available free of charge and released under GPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.

OAIS reference model


Archivematica architecture - Dec 2009


Archivematica Dashboard - March 2010