From Archivematica
Revision as of 11:51, 21 July 2011 by Joseph (Talk | contribs)

Jump to: navigation, search

A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.

  • For more information on bottlenecks see wikipedia.


Processing Power

Archivematica uses it's distributed, multi processing MCP system to mitigate the traditional problems of a processing system. However, this places higher importance on two other bottlenecks: Network and Disk activity.


In Archivematica processing, networking comes into play for two key reasons:

  • distributing tasks
  • central file store accessed over the network

Distributing the tasks and getting the results is fairly light traffic on the network, but if the network is congested, it will hurt the performance of the system by slowing task assignment and results.

We are currently investigating distributed file systems, to avert some of the delay of accessing files remotely. See below.

Disk/Hard drive


RAID (redundant array of inexpensive disks) is a way of distributing the load of a file system on a set of drives. There are various forms of RAID, with different levels of redundancy.

Distributed File System

Distributed file systems are arguably a sub-set of RAIDs. They are distributed over multiple machines, to form a single file system. This has the potential to lighten the Network load for processing.

We are looking at using a distributed file system with archivematica. See Issue 669.


Ceph is a distributed file system, which is currently (July 2011) under alpha development. They have a beta 1.0 release scheduled for release 08/21/2011 see their roadmap.

Personal tools