Difference between revisions of "Optimization"
(Created page with "Main Page > Development > Development documentation > Optimization Category:Development documentation") |
|||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Optimization | [[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Optimization | ||
+ | |||
+ | <div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p> | ||
[[Category:Development documentation]] | [[Category:Development documentation]] | ||
+ | |||
+ | = Introduction = | ||
+ | |||
+ | Archivematica is a complex application, with many moving parts, that can be used in a wide variety of configurations, to process a wide variety of workflows. The topic of optimization therefore has many aspects, and many possible approaches. This page will attempt to document some of the use cases, and different techniques to improve performance. | ||
+ | |||
+ | == Definition == | ||
+ | |||
+ | One measure of performance in an Archivematica installation is throughput - the raw number of files, or gigabytes of files, that can be processed from initial transfer to safe storage as an AIP, per unit of time. Depending on the composition of the original materials, the configuration of Archivematica and its different components, and the hardware resources available, throughput, as measured in GB processed per hour, can vary dramatically. It is difficult to accomodate all possible permutations, so in this discussion we will start with specific use cases, and examine options for improving performance in those specific examples. | ||
+ | |||
+ | == Measuring Performance == | ||
+ | |||
+ | There are a number of ways to measure performance. Counting the number of files that get stored in AIP's each day is one, although there can be large delays, at times when Archivematica is waiting on user input, for example, and the system is idle. Rough counts of gb in storage per week/month are still useful, as they measure actual results. | ||
+ | |||
+ | |||
+ | = Use Cases = | ||
+ | |||
+ | == Single Large File per Transfer == | ||
+ | |||
+ | === Materials === | ||
+ | The user has large files, such as videos that are from 10GB to several hundred GB per file in size. Each Transfer is made up of one single file. | ||
+ | |||
+ | === Hardware Configuration === | ||
+ | |||
+ | Assuming separate file systems for Transfer Source Location, Processing Location and AIP Storage Location. These could be mounted by NFS, or CIFS, the main point being that there is, somewhere, separate hardware providing each file system. | ||
+ | |||
+ | The Archivematica server has 21 cpu cores, and 192GB of ram. The file systems are served from a high capacity Isilon system. | ||
+ | |||
+ | === Example Performance === | ||
+ | |||
+ | starting with a sample transfer consisting of a single .mov file, 110GB in size. | ||
+ | Transfer Source location is configured in the Storage Service in Space A - corresponds to a CIFS mount. | ||
+ | Processing Location is configured in Space B, - corresponds to an NFS mount. | ||
+ | AIP Storage Location is configured in Space A. | ||
+ | |||
+ | Based on timing sample transfers, there are two main bottlenecks in this scenario: | ||
+ | |||
+ | # checksumming | ||
+ | # moving/copying files |
Latest revision as of 16:01, 11 February 2020
Main Page > Development > Development documentation > Optimization
Introduction[edit]
Archivematica is a complex application, with many moving parts, that can be used in a wide variety of configurations, to process a wide variety of workflows. The topic of optimization therefore has many aspects, and many possible approaches. This page will attempt to document some of the use cases, and different techniques to improve performance.
Definition[edit]
One measure of performance in an Archivematica installation is throughput - the raw number of files, or gigabytes of files, that can be processed from initial transfer to safe storage as an AIP, per unit of time. Depending on the composition of the original materials, the configuration of Archivematica and its different components, and the hardware resources available, throughput, as measured in GB processed per hour, can vary dramatically. It is difficult to accomodate all possible permutations, so in this discussion we will start with specific use cases, and examine options for improving performance in those specific examples.
Measuring Performance[edit]
There are a number of ways to measure performance. Counting the number of files that get stored in AIP's each day is one, although there can be large delays, at times when Archivematica is waiting on user input, for example, and the system is idle. Rough counts of gb in storage per week/month are still useful, as they measure actual results.
Use Cases[edit]
Single Large File per Transfer[edit]
Materials[edit]
The user has large files, such as videos that are from 10GB to several hundred GB per file in size. Each Transfer is made up of one single file.
Hardware Configuration[edit]
Assuming separate file systems for Transfer Source Location, Processing Location and AIP Storage Location. These could be mounted by NFS, or CIFS, the main point being that there is, somewhere, separate hardware providing each file system.
The Archivematica server has 21 cpu cores, and 192GB of ram. The file systems are served from a high capacity Isilon system.
Example Performance[edit]
starting with a sample transfer consisting of a single .mov file, 110GB in size. Transfer Source location is configured in the Storage Service in Space A - corresponds to a CIFS mount. Processing Location is configured in Space B, - corresponds to an NFS mount. AIP Storage Location is configured in Space A.
Based on timing sample transfers, there are two main bottlenecks in this scenario:
- checksumming
- moving/copying files