MCPClient

From Archivematica
Revision as of 20:02, 9 March 2017 by Hbecker (talk | contribs) (→‎Client script summaries: Expand details on client scripts)
Jump to navigation Jump to search

Main Page > Development > Development documentation > MCPClient

Design

This page proposes a new feature and reviews design options

Development

This page describes a feature that's in development

Documentation

This page documents an implemented feature

Archivematica has one or more MCPClient instances to perform the actual work. They are gearman worker implementations that inform the gearman server what tasks they can perform, and wait for the server to assign them a task. When a client starts, it connects to the specified gearman server and provides a list of modules they support. When the MCPServer informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCPServer.

Client scripts

Client scripts do the actual work in Archivematica. They are anything that can be run on the command line, from builtins like mv and cp, to custom-written scripts.

New scripts are defined in src/MCPClient/lib/archivematicaClientModules, which is what is registered with Gearman on MCPClient startup.

Improvement note: archivematicaClientModules lists both 'supportedCommandSpecial' and 'supportedCommands'. This distinction may have once been based on scripts that relied on external services, but serves no purpose now and should be removed.

The name is what the StandardTasksConfig table will refer to them as, and the value is the script that will be run. Some are defined as shell builtins (eg copy_v0.0 is cp). Most are paths to a script in the clientScripts directory, using the %clientScriptsDirectory% replacement variable. The name of the client script is usually the same as the name in archivematicaClientModules, but for very old scripts may have ‘archivematica’ at the beginning (eg createMETS_v2.0 = archivematicaCreateMETS2.py) or be named more pythonically (eg parseExternalMETS = parse_external_mets.py). Entries are added alphabetically.

The version (eg copy_v0.0) was originally intended to be used to version the scripts as they changed, and be able to track those changes, but that did not happen. Newer scripts may not have the version defined.

The list of client scripts is sorted roughly in order of appearance during processing

createMETS_v0.0

elasticSearchIndex_v0.0

The data in ElasticSearch is used by the Backlog tab, SIP Arrangement and the Appraisal tab when dealing with files from backlog.

Improvement note: The client config 'disableElasticsearchIndexing' can disable indexing, but this should be removed, since searching for files in backlog is required functionality.

createMETS_v2.0

Perhaps the most important script in Archivematica: it creates the AIP METS which contains all the archival metadata generated by previous client scripts.

This script imports from several other files for additional functionality: archivematicaCreateMETSMetadataCSV archivematicaCreateMETSRights archivematicaCreateMETSRightsDspaceMDRef archivematicaCreateMETSTrim

On reingest, it short-circuits and runs archivematicaCreateMETSReingest to update the METS file instead.

storeAIP_v0.0

  • Purpose: Send the completed AIP to the storage service
  • Script: storeAIP.py
  • Used in: SIP

Sends the currently processing AIP to the storage service. The Location is selected from the list of AIP Storage Locations associated with the Pipeline in previous tasks.


  • Purpose:
  • Script: [1]
  • Used in:
  • Tests:

Config File

Several config settings are read from /etc/archivematica/MCPClient/clientConfig.conf on startup.

Variables in the MCPClient section:

Variable Description Default value
MCPArchivematicaServer URL of the MCP gearman server. Must match the server config file. localhost:4730
sharedDirectoryMounted Directory structure owned by Archivematica and shared between the MCPServer & MCPClient. Must match the server config file. /var/archivematica/sharedDirectory/
archivematicaClientModules Path to the list of jobs to register with Gearman /usr/lib/archivematica/MCPClient/archivematicaClientModules
clientScriptsDirectory Path to the directory where client scripts are installed. Used when parsing archivematicaClientModules /usr/lib/archivematica/MCPClient/clientScripts/
LoadSupportedCommandsSpecial Whether or not to register the SupportedCommandsSpecial section of archivematicaClientModules. This should be removed. True
numberOfTasks Number of MCPClient workers to created. 0 detects the number of cores and uses that. 0
elasticsearchServer URL of the ElasticSearch server. localhost:9200
disableElasticsearchIndexing If true, do not index AIPs or Transfers in backlog. This should be removed, since ElasticSearch indexing is required False
temp_dir Path to the temporary usage directory. Should be in the shared directory /var/archivematica/sharedDirectory/tmp
kioskMode Dashboard setting that disables editing users. This should be removed, or at least moved to dashboard settings False
removableFiles List of filenames that are not archivally significant and can be removed. Thumbs.db, Icon, Icon\r, .DS_Store
django_settings_module Name of the Django settings module, so the client scripts can access the database via the Django ORM. settings.common