Performance Improvements

From Archivematica
Jump to navigation Jump to search

Make indexing configurable

Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off.

  • The installation methods have been tested, documented, and released in Archivematica 1.7.0


Make capture output configurable

The next most place to start for performance improvements was selected: reducing processing time by changing how output streams are handled. In this phase, sending automatically writing standard out and standard error to the database was made configurable. When output capture is turned off, only a non-zero exist code (an error) is returned.

  • This option has been tested, documented for all deployment methods, and released in Archivematica 1.7.1


Make performance metrics accessible via the API

Whenever a preservation task is performed, Archivematica records its timing information (start and end times) in the MySQL database. Columbia University Library wants to be able to measure the processing time (performance) of Archivematica packages and their component microservices so that they can identify bottlenecks, estimate package processing times, and make informed decisions about their configuration.

The problem is that the relevant timing information is not exposed via Archivematica’s API endpoints and is only partially exposed via its GUI. In addition, since this information is internal (not exposed via a public API), it is subject to change and users are therefore wary of building features or implementing workflows that make use of it.

We are therefore implementing an API endpoint that returns processing performance details for a specified package (i.e., a transfer or an AIP) divided by microservice group. This endpoint will return the following data:

Phase of processing (transfer or ingest) Microservice group CPU time Number of Tasks Duration (wall time)

  • This feature will be released in the next minor release of Archivematica


Filter tasks based on the availability of an FPR rule

This feature was analysed but development work was not started. Artefactual has published the basic analysis in the Archivematica wiki, making it possible for other Archivematica users to consider working on this feature

Run ‘remove unneeded files’ microservice once only

Similar to the ‘generate thumbnails’ feature. This feature was analysed but development work was not started. Artefactual has published the basic analysis in the Archivematica wiki, making it possible for other Archivematica users to consider working on this feature

Refactor mcpClient/clientScript structure to reduce database connections.

This feature was analysed and initial exploratory development work was done as part of phase 2. Based on these initial findings, additional development work was funded by Jisc. The feature has now been developed and is currently in the peer review phase. The feature is expected to be available in the next minor release (Archivematica 1.8.0)


Additional Performance Improvement development Additional performance improvement work is being done by other Artefactual clients, primarily Jisc, including:

  • Reduce size of METS and improve xml handling
  • Analysis of performance metrics reporting
  • The analysis methods developed in phase 2 will be used to quantify the results of these features
  • These features are expected to be released in the future, after then next minor release (i.e. in 1.9.0 or later)