Difference between revisions of "Overview"

From Archivematica
Jump to navigation Jump to search
Line 6: Line 6:
  
 
==Open Source OAIS==
 
==Open Source OAIS==
The Archivematica project is integrating a number of open-source tools to create a comprehensive digital archives system that is compliant with the [http://en.wikipedia.org/wiki/Open_Archival_Information_System ISO-OAIS] functional model and other [[wikipedia:Digital preservation|digital preservation]] standards and best practices. All of the Archivematica code and documentation is re-released under GPL and Creative Commons open-source licenses.
+
Archivematica provides an integrated suite of free and open-source tools that allows users to process digital objects from ingest to access in compliance with the [http://en.wikipedia.org/wiki/Open_Archival_Information_System ISO-OAIS] functional model and other [[wikipedia:Digital preservation|digital preservation]] standards and best practices. All of the Archivematica code and documentation is released under GPL and Creative Commons open-source licenses.
  
 
==Micro-Services design pattern==
 
==Micro-Services design pattern==
Archivematica implements a [[micro-services]] design pattern which is an alternative to digital preservation systems that are based on repository systems and J2EE frameworks. These are often too complex for small and medium-sized institutions to deploy and maintain. Instead of relying on a repository interface to a digital object store, the micro-services approach uses loosely-coupled tools to provide granular and orthogonal digital preservation services built around file-system storage. This reduces technical complexity for development and maintenance but is also noteworthy as a long-term preservation strategy because it provides archivists with direct, unmediated access to archival storage. Furthermore, file system technology is long-proven and extremely robust, typically outlasting the lifespan of enterprise information systems.
+
Archivematica implements a [http://www.cdlib.org/services/uc3/curation/ micro-service] approach to digital preservation. The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, XML metadata, etc..
  
==Virtualization technology==
+
These information packages are moved from one service to the next using the well-established [http://en.wikipedia.org/wiki/Pipeline_%28Unix%29 Unix pipeline] design pattern. Each micro-service is defined in a simple XML configuration file and associated with a watched directory. When an information package is moved to that directory it triggers the micro-service.  
Using the latest in virtualization technology, each release of the Archivematica system packages a customized Xubuntu environment as a [http://en.wikipedia.org/wiki/Virtual_appliance virtual appliance], making it possible to run on top of any consumer-grade hardware and operating system, or even directly from a USB key. This means an entire [[External tools|suite of digital preservation tools]] is now available to the average archivist from one simple installation. At the same time, Archivematica can be installed directly on dedicated hardware and its loosely-coupled architecture allows it to be deployed in multi-node, distributed processing configurations to support large-scale, resource-intensive production environments.
 
  
==Receiving files for ingest==
+
Each service is provided by a combination of Archivematica Python scrips and one or more of the free, open-source [[External tools|software tools]] bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is moved accordingly to a success or error directory. Each success or error directory is the watched directory for a subsequent micro-service. This allows for the chaining of directories into complex, custom workflows.
Archivematica provides the ability to create SIP profiles using qualified Dublin Core and METS elements. However, the system will accept files for ingest with as much or as little metadata as is available. The user can take a folder of files and format it for ingest by right-clicking and running a pre-ingest script; the script creates a specific folder structure and adds a dublin.core.xml file and a checksum manifest. Once processing is started, Archivematica runs the SIP through a series of ingest processes including unpacking, checksum verification and creation, unique identification, quarantine, format identification, format validation, metadata extraction and normalization. A variety of tools are used in each of these processes, including Easy Extract, Detox, UUID, CLAM AV, Thunar, Incron, Flock, File Information Tool Set (FITS), OpenOffice, Ghostscript, FFmpeg, ImageMagick, Readpst and Inkscape. Archivematica monitors the progress of each SIP, logs the results of each process, reports on any errors and prompts the archivist to trigger subsequent processes. Archivematica uses the Thunar file manager as its graphical user interface. Release 0.7 added a web-based dashboard to monitor and control the Archivematica workflow processes.
+
 
 +
Archivematica implements a [[Micro-services#Archivematica_Micro-services|default ingest to access workflow]] that is [[Requirements|compliant with the ISO-OAIS]] functional model. Micro-services can be distributed to processing clusters for highly scalable configurations. The Thunar file manager and a web-based dashboard allow users to process, monitor and control the Archivematica workflow processes.
 +
 
 +
==Single install==
 +
Using the latest in virtualization technology, each release of the Archivematica system packages a customized Xubuntu environment as a [http://en.wikipedia.org/wiki/Virtual_appliance virtual appliance], making it possible to run on top of any consumer-grade hardware and operating system, or even directly from a USB key. This means the entire [[External tools|suite of digital preservation tools]] is now available from one simple installation. Archivematica can also be installed directly on dedicated hardware via its own Ubuntu repository. Its client/server processing architecture allows it to be deployed in multi-node, distributed processing configurations to support large-scale, resource-intensive production environments.
  
 
==Media type preservation plans==
 
==Media type preservation plans==
 
Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into [[Media_type_preservation_plans|media type preservation plan]]  (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.
 
Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into [[Media_type_preservation_plans|media type preservation plan]]  (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.
  
The Archivematica media-type preservation plans will be moved to a structured, online [http://rdf.freebase.com/rdf/base.digitalformatpolicies format policy registry] that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. The goal is to make this registry interoperable with [http://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM], the upcoming [http://www.udfr.org/ UDFR] registry, the [http://testbed.planets-project.eu/testbed/public/about.faces Planets Testbed] and risk assessment methodologies like those being developed for the [http://p2-registry.ecs.soton.ac.uk/ Preserve2 registry]. Archivematica installations will use the registry to update their local, default policies and notify users if there has been a change in the risk status or migration options for these formats, allowing them to trigger a migration process using the available normalization tools. Users are free to determine their own preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it easy to add new normalization tools and customize the media-type preservation plans.
+
The Archivematica media-type preservation plans will be moved to a structured, online [http://rdf.freebase.com/rdf/base.digitalformatpolicies format policy registry] that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. The goal is to make this registry interoperable with [http://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM], and the forthcoming [http://www.udfr.org/ UDFR] and Open Planets Foundation registries. Archivematica installations will use the registry to update their local, default policies and notify users if there has been a change in the risk status or migration options for these formats, allowing them to trigger a migration process using the available normalization tools. Users are free to determine their own preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it easy to add new normalization tools and customize the media-type preservation plans.
 
 
==Preparing files for archival storage==
 
Archivematica creates Archival Information Packages (AIPs) using qualified Dublin Core, PREMIS and METS elements and Library of Congress’ Bagit format. Archivematica is able to interact with any number of storage systems using standard protocols (NFS, CIFS, HTTP, etc.) to allow for the flexible implementation of an archival storage and backup strategy. Standard operating system utilities can be used to provide backup functionality. Archival storage options range from local hard disk, external hard disks, network attached storage devices, LOCKSS networks (e.g. MetaArchive, COPPUL), storage grids (e.g. iRODS, Bycast), cloud storage (e.g. Amazon S3, Microsoft Azure), etc.. Ideally, the storage platform provides its own fixity check functionality (e.g. Sun ZFS, LOCKSS, iRODS) but for those that do not, a fixity check daemon will be added to Archivematica in release 0.8.
 
  
==Making files available for access==
+
==From SIP to AIP and DIP==
Archivematica prepares default Dissemination Information Packages (DIP) which are based on the designated access formats for each media type. Consumers can subsequently request AIP copies but caching access copies is a much more scalable approach that will address the majority of access requests in the most performant manner (i.e. reducing the bandwidth and time required to retrieve AIPs from archival storage and uploading them to the Consumer). The DIP access derivatives are sent via a REST interface to a web-based application such as ICA-AtoM or Archon for further enhancement of descriptive metadata (using ISAD(G), EAD, DACS, etc).  These can then be arranged as accruals into existing archival descriptions to provide search and browse access to the institution’s analogue and digital holdings from one common web-based interface. The Archivematica dashboard manages the read and write operations of the AIP to file storage and in future releases will coordinate the syncing of metadata updates between the AIPs and the access system.
+
The primary function of Archivematica is to process SIPs, apply media-type preservation plans and create repository-independent Archival Information Packages (AIP) using METS, PREMIS and Bagit. Archivematica is bundled with ICA-AtoM but is designed to upload Dissemination Information Packages (DIP), containing descriptive metadata and web-ready access copies, to any access system (e.g. Dspace, ContentDM, etc.).  
  
 
==Lowering the barriers to best-practice digital preservation==
 
==Lowering the barriers to best-practice digital preservation==
The goal of the Archivematica project is to give archivists with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today. The project has conducted a thorough [[OAIS Use Cases|OAIS use case]] and process analysis to synthesize the specific, [[UML Activity Diagrams|concrete steps]] that must be carried out to comply with the OAIS functional model from Ingest to Access. Wherever possible, these steps are assigned to software tools within the Archivematica system. If it is not possible to automate these steps in the current system iteration, they are incorporated and [[Documentation|documented]] into a manual procedure to be carried out by the end user. This ensures that the entire set of preservation requirements is being carried out, even in the very early iterations of the system. In short, the system is conceptualized as an integrated whole of technology, people and procedures, not just a set of software tools.
+
The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today. The project has conducted a thorough [[OAIS Use Cases|OAIS use case]] and process analysis to synthesize the specific, [[UML Activity Diagrams|concrete steps]] that must be carried out to comply with the OAIS functional model from Ingest to Access. Wherever possible, these steps are assigned to software tools within the Archivematica system. If it is not possible to automate these steps in the current system iteration, they are incorporated and [[Documentation|documented]] into a manual procedure to be carried out by the end user. This ensures that the entire set of preservation requirements is being carried out, even in the early, pre 1.0 system releases. In short, the system is conceptualized as an integrated whole of technology, people and procedures, not just a set of software tools.
  
 
All of the software, documentation and development infrastructure are available free of charge and released under GPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.
 
All of the software, documentation and development infrastructure are available free of charge and released under GPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.
Line 36: Line 36:
  
 
[[File:OAIS.png|thumb|left|300px|OAIS reference model]]
 
[[File:OAIS.png|thumb|left|300px|OAIS reference model]]
 +
 +
[[File:Micro-service.png|thumb|left|300px|Archivematica micro-service structure]]
  
 
[[File:pre-ingest-1.png|thumb|left|300px|Archivematica 0.7-alpha: Running the pre-ingest script]]
 
[[File:pre-ingest-1.png|thumb|left|300px|Archivematica 0.7-alpha: Running the pre-ingest script]]
Line 41: Line 43:
 
[[File:Dashboard-0.7.png |thumb|left|300px|Archivematica 0.7-alpha Dashboard]]
 
[[File:Dashboard-0.7.png |thumb|left|300px|Archivematica 0.7-alpha Dashboard]]
 
|}
 
|}
 +
 +
  
  
 
__NOTOC__
 
__NOTOC__

Revision as of 23:27, 20 February 2011

Main Page > Documentation > Technical Architecture > Overview

Open Source OAIS

Archivematica provides an integrated suite of free and open-source tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model and other digital preservation standards and best practices. All of the Archivematica code and documentation is released under GPL and Creative Commons open-source licenses.

Micro-Services design pattern

Archivematica implements a micro-service approach to digital preservation. The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, XML metadata, etc..

These information packages are moved from one service to the next using the well-established Unix pipeline design pattern. Each micro-service is defined in a simple XML configuration file and associated with a watched directory. When an information package is moved to that directory it triggers the micro-service.

Each service is provided by a combination of Archivematica Python scrips and one or more of the free, open-source software tools bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is moved accordingly to a success or error directory. Each success or error directory is the watched directory for a subsequent micro-service. This allows for the chaining of directories into complex, custom workflows.

Archivematica implements a default ingest to access workflow that is compliant with the ISO-OAIS functional model. Micro-services can be distributed to processing clusters for highly scalable configurations. The Thunar file manager and a web-based dashboard allow users to process, monitor and control the Archivematica workflow processes.

Single install

Using the latest in virtualization technology, each release of the Archivematica system packages a customized Xubuntu environment as a virtual appliance, making it possible to run on top of any consumer-grade hardware and operating system, or even directly from a USB key. This means the entire suite of digital preservation tools is now available from one simple installation. Archivematica can also be installed directly on dedicated hardware via its own Ubuntu repository. Its client/server processing architecture allows it to be deployed in multi-node, distributed processing configurations to support large-scale, resource-intensive production environments.

Media type preservation plans

Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into media type preservation plan (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.

The Archivematica media-type preservation plans will be moved to a structured, online format policy registry that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. The goal is to make this registry interoperable with PRONOM, and the forthcoming UDFR and Open Planets Foundation registries. Archivematica installations will use the registry to update their local, default policies and notify users if there has been a change in the risk status or migration options for these formats, allowing them to trigger a migration process using the available normalization tools. Users are free to determine their own preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it easy to add new normalization tools and customize the media-type preservation plans.

From SIP to AIP and DIP

The primary function of Archivematica is to process SIPs, apply media-type preservation plans and create repository-independent Archival Information Packages (AIP) using METS, PREMIS and Bagit. Archivematica is bundled with ICA-AtoM but is designed to upload Dissemination Information Packages (DIP), containing descriptive metadata and web-ready access copies, to any access system (e.g. Dspace, ContentDM, etc.).

Lowering the barriers to best-practice digital preservation

The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today. The project has conducted a thorough OAIS use case and process analysis to synthesize the specific, concrete steps that must be carried out to comply with the OAIS functional model from Ingest to Access. Wherever possible, these steps are assigned to software tools within the Archivematica system. If it is not possible to automate these steps in the current system iteration, they are incorporated and documented into a manual procedure to be carried out by the end user. This ensures that the entire set of preservation requirements is being carried out, even in the early, pre 1.0 system releases. In short, the system is conceptualized as an integrated whole of technology, people and procedures, not just a set of software tools.

All of the software, documentation and development infrastructure are available free of charge and released under GPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.

OAIS reference model
Archivematica micro-service structure
Archivematica 0.7-alpha: Running the pre-ingest script
Archivematica 0.7-alpha Dashboard