Performance Improvements

From Archivematica
Revision as of 15:24, 8 June 2018 by Kelly (talk | contribs)
Jump to navigation Jump to search

Make indexing configurable

Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off.

  • The installation methods have been tested, documented, and released in Archivematica 1.7.0


Make capture output configurable

The next most place to start for performance improvements was selected: reducing processing time by changing how output streams are handled. In this phase, sending automatically writing standard out and standard error to the database was made configurable. When output capture is turned off, only a non-zero exist code (an error) is returned.

  • This option has been tested, documented for all deployment methods, and released in Archivematica 1.7.1


Make performance metrics accessible via the API

Whenever a preservation task is performed, Archivematica records its timing information (start and end times) in the MySQL database. Columbia University Library wants to be able to measure the processing time (performance) of Archivematica packages and their component microservices so that they can identify bottlenecks, estimate package processing times, and make informed decisions about their configuration.

The problem is that the relevant timing information is not exposed via Archivematica’s API endpoints and is only partially exposed via its GUI. In addition, since this information is internal (not exposed via a public API), it is subject to change and users are therefore wary of building features or implementing workflows that make use of it.

We are therefore implementing an API endpoint that returns processing performance details for a specified package (i.e., a transfer or an AIP) divided by microservice group. This endpoint will return the following data:

Phase of processing (transfer or ingest) Microservice group CPU time Number of Tasks Duration (wall time)

  • This feature will be released in the next minor release of Archivematica

Other performance improvements TODO

Filter tasks based on the availability of an FPR rule - this feature was analysed but development work was not started.