Difference between revisions of "Bottlenecks"

From Archivematica
Jump to navigation Jump to search
Line 9: Line 9:
 
In Archivematica processing, networking comes into play for two key reasons:
 
In Archivematica processing, networking comes into play for two key reasons:
 
* distributing tasks
 
* distributing tasks
* central file store access
+
* central file store accessed over the network
 +
 
 +
Distributing the tasks and getting the results is fairly light traffic on the network, but if the network is congested, it will hurt the performance of the system by slowing task assignment and results.
 +
 
 +
We are currently investigating distributed file systems, to avert some of the delay of accessing files remotely. See below.
  
 
= Disk/Hard drive =
 
= Disk/Hard drive =

Revision as of 10:51, 21 July 2011

A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.

  • For more information on bottlenecks see wikipedia.

Processing Power

Archivematica uses it's distributed, multi processing MCP system to mitigate the traditional problems of a processing system. However, this places higher importance on two other bottlenecks: Network and Disk activity.

Network

In Archivematica processing, networking comes into play for two key reasons:

  • distributing tasks
  • central file store accessed over the network

Distributing the tasks and getting the results is fairly light traffic on the network, but if the network is congested, it will hurt the performance of the system by slowing task assignment and results.

We are currently investigating distributed file systems, to avert some of the delay of accessing files remotely. See below.

Disk/Hard drive

RAID

RAID (redundant array of inexpensive disks) is a way of distributing the load of a file system on a set of drives. There are various forms of RAID, with different levels of redundancy.

Distributed File System

Distributed file systems are arguably a sub-set of RAIDs. They are distributed over multiple machines, to form a single file system. This has the potential to lighten the Network load for processing.

We are looking at using a distributed file system with archivematica. See Issue 669.

Ceph

Ceph is a distributed file system, which is currently (July 2011) under alpha development. They have a beta 1.0 release scheduled for release 08/21/2011 see their roadmap.