Difference between revisions of "Performance Improvements"

From Archivematica
Jump to navigation Jump to search
(Created page with "'''Make indexing configurable''' Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off. '''Make capture output configurable'''...")
 
Line 1: Line 1:
 
'''Make indexing configurable'''
 
'''Make indexing configurable'''
 +
 
Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off.  
 
Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off.  
 +
 +
The installation methods have been tested, [https://www.archivematica.org/en/docs/archivematica-1.7/user-manual/administer/dashboard-admin/#elasticsearch-indexing documented], and released in Archivematica 1.7.0
 +
  
 
'''Make capture output configurable'''
 
'''Make capture output configurable'''

Revision as of 12:58, 8 June 2018

Make indexing configurable

Make search an optional feature in Archivematica so that it can be run with ElasticSearch turned off.

The installation methods have been tested, documented, and released in Archivematica 1.7.0


Make capture output configurable

The next most place to start for performance improvements was selected: reducing processing time by changing how output streams are handled. In this phase, sending automatically writing standard out and standard error to the database was made configurable. When output capture is turned off, only a non-zero exist code (an error) is returned.

This option has been tested, documented for all deployment methods, and released in Archivematica 1.7.1


Make performance metrics accessible via the API

Whenever a preservation task is performed, Archivematica records its timing information (start and end times) in the MySQL database. Columbia University Library wants to be able to measure the processing time (performance) of Archivematica packages and their component microservices so that they can identify bottlenecks, estimate package processing times, and make informed decisions about their configuration.

The problem is that the relevant timing information is not exposed via Archivematica’s API endpoints and is only partially exposed via its GUI. In addition, since this information is internal (not exposed via a public API), it is subject to change and users are therefore wary of building features or implementing workflows that make use of it.

We are therefore implementing an API endpoint that returns processing performance details for a specified package (i.e., a transfer or an AIP) divided by microservice group. This endpoint will return the following data:

Phase of processing (transfer or ingest) Microservice group CPU time Number of Tasks Duration (wall time)