Bottlenecks

From Archivematica
Revision as of 17:13, 20 July 2011 by Joseph (talk | contribs)
Jump to navigation Jump to search

A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.

  • For more information on bottlenecks see wikipedia.

Processing Power

Archivematica uses it's distributed, multi processing MCP system to mitigate the traditional problems of a processing system. However, this places higher importance on two other bottlenecks: Network and Disk activity.

Network

In Archivematica processing, networking comes into play for two key reasons:

  • distributing tasks
  • central file store access

Disk/Hard drive

RAID

RAID (redundant array of inexpensive disks) is a way of distributing the load of a file system on a set of drives. There are various forms of RAID, with different levels of redundancy.

Distributed File System

Distributed file systems are arguably a sub-set of RAIDs. They are distributed over multiple machines, to form a single file system. This has the potential to lighten the Network load for processing.

We are looking at using a distributed file system with archivematica. See Issue 669.

Ceph

Ceph is a distributed file system, which is currently (July 2011) under alpha development. They have a beta 1.0 release scheduled for release 08/21/2011 see their roadmap.