Difference between revisions of "Overview"

From Archivematica
Jump to navigation Jump to search
(Remove outline table, update for present tense.)
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Documentation]] > [[Technical Architecture]] > Overview
 
[[Main Page]] > [[Documentation]] > [[Technical Architecture]] > Overview
  
{|style="width:95%; border="0"
+
==Open source OAIS==
|-valign="top"
+
Archivematica provides an integrated suite of free and open-source tools that allows users to process digital objects from [[Micro-services#Archivematica_Micro-services|ingest to archival storage and access]] in [[Requirements|compliance]] with the  [http://en.wikipedia.org/wiki/Open_Archival_Information_System ISO-OAIS]  functional model and other [[wikipedia:Digital preservation|digital preservation]] standards and best practices. All of the Archivematica code and documentation is released under AGPL and Creative Commons open-source licenses.
|style="width: 70%; padding: 0.5em 1em 1em; color: rgb(0, 0, 0);"|
 
  
==Open Source OAIS==
+
[[File:OAIS.png|thumb|left|500px|OAIS reference model]]
Archivematica provides an integrated suite of free and open-source tools that allows users to process digital objects from ingest to archival storage and access in compliance with the  [http://en.wikipedia.org/wiki/Open_Archival_Information_System ISO-OAIS]  functional model and other [[wikipedia:Digital preservation|digital preservation]] standards and best practices. All of the Archivematica code and documentation is released under AGPL and Creative Commons open-source licenses.
 
  
==Micro-Services design pattern==
+
==Micro-services design pattern==
 
Archivematica implements a [http://www.cdlib.org/services/uc3/curation/ micro-service] approach to digital preservation. The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, submission documentation, XML metadata, etc..  
 
Archivematica implements a [http://www.cdlib.org/services/uc3/curation/ micro-service] approach to digital preservation. The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, submission documentation, XML metadata, etc..  
  
These information packages are processed using a series of micro-services. Micro-services are provided by a combination of Archivematica Python scripts and one or more of the free, open-source [[External tools|software tools]] bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is processed accordingly by the next micro-service. There are a variety of mechanisms used to connect the various micro-services together into complex, custom workflows. Resulting in a complete ingest to access system.
+
These information packages are processed using a series of micro-services. Micro-services are provided by a combination of Archivematica Python scripts and one or more of the free, open-source [[External tools|software tools]] bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is processed accordingly by the next micro-service. There are a variety of mechanisms used to connect the various micro-services together into complex, custom workflows. Micro-services can be distributed to processing clusters for highly scalable configurations.
 
 
Archivematica implements a [[Micro-services#Archivematica_Micro-services|default ingest to access workflow]] that is [[Requirements|compliant with the ISO-OAIS]] functional model. Micro-services can be distributed to processing clusters for highly scalable configurations. The Thunar file manager and a web-based dashboard allow users to process, monitor and control the Archivematica workflow processes. Over the course of the 0.9- and 1.0-beta releases, the Thunar file manager will be incrementally replaced by an in-browser file manager for multi-stage package review, transfer analysis and SIP preparation.
 
  
 
==Dashboard==
 
==Dashboard==
  
The Archivematica dashboard is a web-based tool developed using Python-based Django MVC framework. It provides a multi-user interface that will report on the status of system events and make it simpler to control and trigger specific micro-services. This interface allows users to easily add or edit metadata, coordinate AIP and DIP storage and provide preservation planning information. Notifications include error reports, monitoring of MCP tasks and manual approvals in the workflow. In coming releases, the dashboard will support a transfer backlog linked to accession data as well as indexing, analysis, arrangement and minimal description of transfer(s) into SIP(s). An administration area will allow users to manage storage locations, configuration of micro-services, alteration of preservation plans and user access levels.
+
[[Image:CreateSIPs-10.png|500px|thumb|left|In Dashboard: A transfer that is has completed micro-service jobs in the transfer workflow to be packaged into a SIP, arranged in Ingest or stored in the backlog]]
  
==Single install==
+
The web dashboard allow users to process, monitor and control the Archivematica workflow processes. It is developed using Python-based Django MVC framework. The Dashboard provides a multi-user interface that reports on the status of system events and makes it simpler to control and trigger specific micro-services. This interface allows users to easily add or edit metadata, coordinate AIP and DIP storage and provide preservation planning information. Notifications include error reports, monitoring of MCP tasks and manual approvals in the workflow. The dashboard also supports a transfer backlog linked to accession data as well as indexing, analysis, arrangement and minimal description of transfer(s) into SIP(s). An administration area allows users to manage storage locations, configuration of micro-services, alteration of preservation plans and user access levels.
Using the latest in virtualization technology, each release of the Archivematica system packages a customized Xubuntu environment as a [http://en.wikipedia.org/wiki/Virtual_appliance virtual appliance], making it possible to run on top of any consumer-grade hardware and operating system. This means the entire [[External tools|suite of digital preservation tools]] is now available from one simple installation. Archivematica can also be installed directly on dedicated hardware via its own Ubuntu repository. Its client/server processing architecture allows it to be deployed in multi-node, distributed processing configurations to support large-scale, resource-intensive production environments.
 
  
 
==Format policies==
 
==Format policies==
 
Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into [[Media_type_preservation_plans|format policies]]  (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.  
 
Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into [[Media_type_preservation_plans|format policies]]  (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.  
  
For the 1.0 production release, Archivematica format policies will be moved to a structured, online format policy registry ([[Format_policy_registry_requirements|FPR]]) that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. The goal is to make this registry interoperable with [http://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM], the [http://corereg.arts.gla.ac.uk/PlanetsCoreRegistry/welcome.html Planets Core Registry] and/or the forthcoming [http://www.udfr.org/ Universal Digital Format Registry] (UDFR). Archivematica installations will use the registry to update their local, default policies and notify users if there has been a change in the risk status or migration options for these formats, allowing them to trigger a migration process using the available normalization tools. Users are free to determine their own format preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it easy to add new normalization tools and customize local format policies.
+
Archivematica format policies are managed in a structured, online format policy registry ([[Format_policy_registry_requirements|FPR]]) that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. This registry is synced with [http://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM], and the goal is to ultimately integrate other registries like the [http://corereg.arts.gla.ac.uk/PlanetsCoreRegistry/welcome.html Planets Core Registry] and/or the [http://www.udfr.org/ Universal Digital Format Registry] (UDFR). Archivematica installations use the FPR to update their local, default policies. Users are free to determine their own format preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it simple to add new normalization tools and customize local format policies.
  
 
==From Transfer to SIP to AIP and DIP==
 
==From Transfer to SIP to AIP and DIP==
The primary function of Archivematica is to process digital transfers (accessioned digital objects), turn them into SIPs, apply format policies and create high-quality, repository-independent Archival Information Packages (AIP) using [http://www.loc.gov/standards/mets/ METS], [http://www.loc.gov/standards/premis/ PREMIS] and [https://confluence.ucop.edu/download/attachments/16744580/BagItSpec.pdf?version=1 Bagit]. Archivematica is bundled with ICA-AtoM but is designed to upload Dissemination Information Packages (DIP), containing descriptive metadata and web-ready access copies, to any access system (e.g. Dspace, ContentDM, etc.).
+
The primary function of Archivematica is to process digital transfers (accessioned digital objects), turn them into Submission Information Packages (SIPs), apply format policies and create high-quality, repository-independent Archival Information Packages (AIP) using [http://www.loc.gov/standards/mets/ METS], [http://www.loc.gov/standards/premis/ PREMIS] and [https://confluence.ucop.edu/download/attachments/16744580/BagItSpec.pdf?version=1 Bagit]. Archivematica is bundled with AtoM, but is designed to upload Dissemination Information Packages (DIP), containing descriptive metadata and web-ready access copies, to several access systems (e.g. Dspace, ContentDM, etc.).
 +
 
 +
 
 +
[[File:AMarch.png|500px|left|thumb|Archivematica Ingest infrastructure overview]]
  
 
==Lowering the barriers to best-practice digital preservation==
 
==Lowering the barriers to best-practice digital preservation==
Line 35: Line 33:
 
All of the software, documentation and development infrastructure are available free of charge and released under AGPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.
 
All of the software, documentation and development infrastructure are available free of charge and released under AGPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.
  
|style="padding: 0.5em 1em 1em; color: rgb(0, 0, 0);"|
+
[[Contribute code|Code contributions]], [https://projects.artefactual.com/projects/archivematica bug reports], wiki [[Special:UserLogin|documentation updates]] along with questions and feedback on the [http://groups.google.ca/group/archivematica discussion list] are strongly encouraged and welcomed.
 
 
[[File:OAIS.png|thumb|left|300px|OAIS reference model]]
 
 
 
[[File:pre-ingest-1.png|thumb|left|300px|Archivematica 0.7-alpha: Running the pre-ingest script in Thunar]]
 
  
[[File:0.8_IngestTab.png |thumb|left|300px|Archivematica 0.8-alpha Dashboard]]
+
Commercial licenses and commercial use of the Archivematica name and logo [[trademark|trademarks]] may be negotiated with [http://artefactual.com Artefactual Systems] on a case-by-case basis.
  
[[File:Archivematica-0.8-beta-architecture.png|thumb|left|300px|Archivematica 0.8-alpha architecture]]
+
== Agile development ==
|}
 
  
 +
Digital preservation systems must implement strategies that deal with technology obsolescence and incompatibility to ensure that digital objects remain authentic, accessible and useable for future use. The technologies that create digital objects and the technology available to manage them are constantly changing. Therefore, the Archivematica project has established an [[wikipedia:Agile software development|agile software development]] methodology to manage the perpetual maintenance and development of the system. This methodology is focused on rapid, iterative release cycles, each of which improves upon the system's [[Technical_Architecture|architecture]], [[requirements]], [[External_tools|tools]], [[documentation]], and [[development]] resources.
  
  
  
 
__NOTOC__
 
__NOTOC__

Latest revision as of 19:41, 9 March 2017

Main Page > Documentation > Technical Architecture > Overview

Open source OAIS[edit]

Archivematica provides an integrated suite of free and open-source tools that allows users to process digital objects from ingest to archival storage and access in compliance with the ISO-OAIS functional model and other digital preservation standards and best practices. All of the Archivematica code and documentation is released under AGPL and Creative Commons open-source licenses.

OAIS reference model

Micro-services design pattern[edit]

Archivematica implements a micro-service approach to digital preservation. The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, submission documentation, XML metadata, etc..

These information packages are processed using a series of micro-services. Micro-services are provided by a combination of Archivematica Python scripts and one or more of the free, open-source software tools bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is processed accordingly by the next micro-service. There are a variety of mechanisms used to connect the various micro-services together into complex, custom workflows. Micro-services can be distributed to processing clusters for highly scalable configurations.

Dashboard[edit]

In Dashboard: A transfer that is has completed micro-service jobs in the transfer workflow to be packaged into a SIP, arranged in Ingest or stored in the backlog

The web dashboard allow users to process, monitor and control the Archivematica workflow processes. It is developed using Python-based Django MVC framework. The Dashboard provides a multi-user interface that reports on the status of system events and makes it simpler to control and trigger specific micro-services. This interface allows users to easily add or edit metadata, coordinate AIP and DIP storage and provide preservation planning information. Notifications include error reports, monitoring of MCP tasks and manual approvals in the workflow. The dashboard also supports a transfer backlog linked to accession data as well as indexing, analysis, arrangement and minimal description of transfer(s) into SIP(s). An administration area allows users to manage storage locations, configuration of micro-services, alteration of preservation plans and user access levels.

Format policies[edit]

Archivematica maintains the original format of all ingested files to support migration and emulation strategies. However, the primary preservation strategy is to normalize files to preservation and access formats upon ingest. Archivematica groups file formats into format policies (e.g. text, audio, video, raster image, vector image, etc.). Archivematica's preservation formats must all be open standards. Additionally, the choice of formats is based on community best practices, availability of free and open-source normalization tools, and an analysis of the significant characteristics for each media type. The choice of access formats is based largely on the ubiquity of web-based viewers for the file format.

Archivematica format policies are managed in a structured, online format policy registry (FPR) that brings together format identification information with significant characteristic analysis, risk assessments and normalization tool information to arrive at default preservation format and access format policies for Archivematica. This registry is synced with PRONOM, and the goal is to ultimately integrate other registries like the Planets Core Registry and/or the Universal Digital Format Registry (UDFR). Archivematica installations use the FPR to update their local, default policies. Users are free to determine their own format preservation policies, whether based on alternate institutional policies or developed through the use of a formal preservation policy tool like Plato. The system is configured to make it simple to add new normalization tools and customize local format policies.

From Transfer to SIP to AIP and DIP[edit]

The primary function of Archivematica is to process digital transfers (accessioned digital objects), turn them into Submission Information Packages (SIPs), apply format policies and create high-quality, repository-independent Archival Information Packages (AIP) using METS, PREMIS and Bagit. Archivematica is bundled with AtoM, but is designed to upload Dissemination Information Packages (DIP), containing descriptive metadata and web-ready access copies, to several access systems (e.g. Dspace, ContentDM, etc.).


Archivematica Ingest infrastructure overview

Lowering the barriers to best-practice digital preservation[edit]

The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today. The project has conducted a thorough OAIS use case and process analysis to synthesize the specific, concrete steps that must be carried out to comply with the OAIS functional model from Ingest to Access. Through deployment experiences and user feedback, the project has expanded even beyond OAIS to address analysis and arrangement of transferred digital objects into SIPs and allow for archival appraisal at multiple decision points. Wherever possible, these requirements are assigned to software tools within the Archivematica system. If it is not possible to automate these steps in the current system iteration, they are incorporated and documented into a manual procedure to be carried out by the end user. This ensures that the entire set of preservation requirements is being carried out, even in the early, pre 1.0 system releases. In short, the system is conceptualized as an integrated whole of technology, people and procedures, not just a set of software tools. For institutions that want technical assistance to install and customize Archivematica, optional technical support services are provided by Artefactual Systems.

All of the software, documentation and development infrastructure are available free of charge and released under AGPL and Creative Commons licenses to give users the freedom to study, adapt and re-distribute these resources as best suits them. Rather than spend precious funding on proprietary software licenses that restrict these freedoms, the Archivematica project encourages memory institutions tackling the challenges of digital preservation to pool their financial and technical resources in projects like Archivematica to maximize their long-term investments for the benefit of their colleagues, users and professional community as a whole.

Code contributions, bug reports, wiki documentation updates along with questions and feedback on the discussion list are strongly encouraged and welcomed.

Commercial licenses and commercial use of the Archivematica name and logo trademarks may be negotiated with Artefactual Systems on a case-by-case basis.

Agile development[edit]

Digital preservation systems must implement strategies that deal with technology obsolescence and incompatibility to ensure that digital objects remain authentic, accessible and useable for future use. The technologies that create digital objects and the technology available to manage them are constantly changing. Therefore, the Archivematica project has established an agile software development methodology to manage the perpetual maintenance and development of the system. This methodology is focused on rapid, iterative release cycles, each of which improves upon the system's architecture, requirements, tools, documentation, and development resources.