Difference between revisions of "Bottlenecks"

From Archivematica
Jump to navigation Jump to search
(Created page with 'Category:Development documentation A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow. * For mo...')
 
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
 +
 
A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.
 
A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.
 
* For more information on bottlenecks see [http://en.wikipedia.org/wiki/Bottleneck wikipedia.]
 
* For more information on bottlenecks see [http://en.wikipedia.org/wiki/Bottleneck wikipedia.]
Line 9: Line 12:
 
In Archivematica processing, networking comes into play for two key reasons:
 
In Archivematica processing, networking comes into play for two key reasons:
 
* distributing tasks
 
* distributing tasks
* central file store access
+
* central file store accessed over the network
 +
 
 +
Distributing the tasks and getting the results is fairly light traffic on the network, but if the network is congested, it will hurt the performance of the system by slowing task assignment and results.
 +
 
 +
We are currently investigating distributed file systems, to avert some of the delay of accessing files remotely. See below.
  
 
= Disk/Hard drive =
 
= Disk/Hard drive =
 +
Hard drive access is one of the key bottlenecks in the Archivematica system. All of the operations performed on the objects require reading of the objects from the drive. There are a number of ways to improve disk read performance in a system.
  
 
== RAID ==
 
== RAID ==
 +
RAID (redundant array of inexpensive disks) is a way of distributing the load of a file system on a set of drives. There are various forms of RAID, with different levels of redundancy.
 +
* For more information on RAIDs see [http://en.wikipedia.org/wiki/RAID wikipedia.]
  
 
== Distributed File System ==
 
== Distributed File System ==
 +
Distributed file systems are arguably a sub-set of RAIDs. They are distributed over multiple machines, to form a single file system. This has the potential to lighten the Network load for processing.
 +
 +
We are looking at using a distributed file system with archivematica. See [http://code.google.com/p/archivematica/issues/detail?id=669 Issue 669.]
 +
 +
=== Ceph ===
 +
Ceph is a distributed file system, which is currently (July 2011) under alpha development. They have a beta 1.0 release scheduled for release 08/21/2011 see their [http://tracker.newdream.net/projects/ceph/roadmap roadmap.]

Latest revision as of 15:41, 11 February 2020


This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

A bottleneck refers to a point of congestion in a system, typically a place of limited resources, where workflow is prone to slow.

  • For more information on bottlenecks see wikipedia.

Processing Power[edit]

Archivematica uses it's distributed, multi processing MCP system to mitigate the traditional problems of a processing system. However, this places higher importance on two other bottlenecks: Network and Disk activity.

Network[edit]

In Archivematica processing, networking comes into play for two key reasons:

  • distributing tasks
  • central file store accessed over the network

Distributing the tasks and getting the results is fairly light traffic on the network, but if the network is congested, it will hurt the performance of the system by slowing task assignment and results.

We are currently investigating distributed file systems, to avert some of the delay of accessing files remotely. See below.

Disk/Hard drive[edit]

Hard drive access is one of the key bottlenecks in the Archivematica system. All of the operations performed on the objects require reading of the objects from the drive. There are a number of ways to improve disk read performance in a system.

RAID[edit]

RAID (redundant array of inexpensive disks) is a way of distributing the load of a file system on a set of drives. There are various forms of RAID, with different levels of redundancy.

Distributed File System[edit]

Distributed file systems are arguably a sub-set of RAIDs. They are distributed over multiple machines, to form a single file system. This has the potential to lighten the Network load for processing.

We are looking at using a distributed file system with archivematica. See Issue 669.

Ceph[edit]

Ceph is a distributed file system, which is currently (July 2011) under alpha development. They have a beta 1.0 release scheduled for release 08/21/2011 see their roadmap.