Difference between revisions of "MCPClient"
(Add config file section) |
(→Client script summaries: Expand details on client scripts) |
||
Line 18: | Line 18: | ||
Archivematica has one or more MCPClient instances to perform the actual work. They are gearman worker implementations that inform the gearman server what tasks they can perform, and wait for the server to assign them a task. When a client starts, it connects to the specified gearman server and provides a list of modules they support. When the [[MCPServer]] informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCPServer. | Archivematica has one or more MCPClient instances to perform the actual work. They are gearman worker implementations that inform the gearman server what tasks they can perform, and wait for the server to assign them a task. When a client starts, it connects to the specified gearman server and provides a list of modules they support. When the [[MCPServer]] informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCPServer. | ||
− | == Client | + | == Client scripts == |
Client scripts do the actual work in Archivematica. They are anything that can be run on the command line, from builtins like mv and cp, to custom-written scripts. | Client scripts do the actual work in Archivematica. They are anything that can be run on the command line, from builtins like mv and cp, to custom-written scripts. | ||
Line 31: | Line 31: | ||
The version (eg copy_v0.0) was originally intended to be used to version the scripts as they changed, and be able to track those changes, but that did not happen. Newer scripts may not have the version defined. | The version (eg copy_v0.0) was originally intended to be used to version the scripts as they changed, and be able to track those changes, but that did not happen. Newer scripts may not have the version defined. | ||
− | + | The list of client scripts is sorted roughly in order of appearance during processing | |
− | |||
− | |||
− | |||
+ | === createMETS_v0.0 === | ||
+ | |||
+ | * '''Purpose''': Generate the transfer METS file | ||
+ | * '''Script''': [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETS.py archivematicaCreateMETS.py] | ||
+ | * '''Used in''': Transfer | ||
+ | |||
+ | === elasticSearchIndex_v0.0 === | ||
+ | |||
+ | * '''Purpose''': Index the Transfer METS into ElasticSearch when sending files to backlog | ||
+ | * '''Script''': [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/elasticSearchIndexProcessTransfer.py elasticSearchIndexProcessTransfer.py] | ||
+ | * '''Used in''': Transfer | ||
+ | |||
+ | The data in ElasticSearch is used by the Backlog tab, SIP Arrangement and the Appraisal tab when dealing with files from backlog. | ||
+ | {| class="wikitable" style="background-color:#ffeecc;" cellpadding="10"; | ||
+ | | Improvement note: The client config 'disableElasticsearchIndexing' can disable indexing, but this should be removed, since searching for files in backlog is required functionality. | ||
+ | |} | ||
+ | === createMETS_v2.0 === | ||
+ | |||
+ | * '''Purpose''': Generate the AIP METS file | ||
+ | * '''Script''': [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETS2.py archivematicaCreateMETS2.py] | ||
+ | * '''Used in''': SIP | ||
+ | * '''Tests''': | ||
+ | ** [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/tests/test_create_aip_mets.py test_create_aip_mets.py] | ||
+ | ** [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/tests/test_reingest_mets.py test_reingest_mets.py] | ||
+ | |||
+ | Perhaps the most important script in Archivematica: it creates the AIP METS which contains all the archival metadata generated by previous client scripts. | ||
+ | |||
+ | This script imports from several other files for additional functionality: | ||
+ | [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETSMetadataCSV.py archivematicaCreateMETSMetadataCSV] | ||
+ | [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETSRights.py archivematicaCreateMETSRights] | ||
+ | [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETSRightsDspaceMDRef.py archivematicaCreateMETSRightsDspaceMDRef] | ||
+ | [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETSTrim.py archivematicaCreateMETSTrim] | ||
+ | |||
+ | On reingest, it short-circuits and runs [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/archivematicaCreateMETSReingest.py archivematicaCreateMETSReingest] to update the METS file instead. | ||
+ | |||
+ | === storeAIP_v0.0 === | ||
+ | |||
+ | * '''Purpose''': Send the completed AIP to the storage service | ||
+ | * '''Script''': [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/storeAIP.py storeAIP.py] | ||
+ | * '''Used in''': SIP | ||
+ | |||
+ | Sends the currently processing AIP to the storage service. The Location is selected from the list of AIP Storage Locations associated with the Pipeline in previous tasks. | ||
+ | |||
+ | |||
+ | === === | ||
+ | |||
+ | * '''Purpose''': | ||
+ | * '''Script''': [https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/] | ||
+ | * '''Used in''': | ||
+ | * '''Tests''': | ||
== Config File == | == Config File == |
Revision as of 20:02, 9 March 2017
Main Page > Development > Development documentation > MCPClient
Design
Development
Documentation
Archivematica has one or more MCPClient instances to perform the actual work. They are gearman worker implementations that inform the gearman server what tasks they can perform, and wait for the server to assign them a task. When a client starts, it connects to the specified gearman server and provides a list of modules they support. When the MCPServer informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCPServer.
Client scripts
Client scripts do the actual work in Archivematica. They are anything that can be run on the command line, from builtins like mv and cp, to custom-written scripts.
New scripts are defined in src/MCPClient/lib/archivematicaClientModules
, which is what is registered with Gearman on MCPClient startup.
Improvement note: archivematicaClientModules lists both 'supportedCommandSpecial' and 'supportedCommands'. This distinction may have once been based on scripts that relied on external services, but serves no purpose now and should be removed. |
The name is what the StandardTasksConfig table will refer to them as, and the value is the script that will be run. Some are defined as shell builtins (eg copy_v0.0 is cp). Most are paths to a script in the clientScripts directory, using the %clientScriptsDirectory%
replacement variable. The name of the client script is usually the same as the name in archivematicaClientModules, but for very old scripts may have ‘archivematica’ at the beginning (eg createMETS_v2.0 = archivematicaCreateMETS2.py) or be named more pythonically (eg parseExternalMETS = parse_external_mets.py). Entries are added alphabetically.
The version (eg copy_v0.0) was originally intended to be used to version the scripts as they changed, and be able to track those changes, but that did not happen. Newer scripts may not have the version defined.
The list of client scripts is sorted roughly in order of appearance during processing
createMETS_v0.0
- Purpose: Generate the transfer METS file
- Script: archivematicaCreateMETS.py
- Used in: Transfer
elasticSearchIndex_v0.0
- Purpose: Index the Transfer METS into ElasticSearch when sending files to backlog
- Script: elasticSearchIndexProcessTransfer.py
- Used in: Transfer
The data in ElasticSearch is used by the Backlog tab, SIP Arrangement and the Appraisal tab when dealing with files from backlog.
Improvement note: The client config 'disableElasticsearchIndexing' can disable indexing, but this should be removed, since searching for files in backlog is required functionality. |
createMETS_v2.0
- Purpose: Generate the AIP METS file
- Script: archivematicaCreateMETS2.py
- Used in: SIP
- Tests:
Perhaps the most important script in Archivematica: it creates the AIP METS which contains all the archival metadata generated by previous client scripts.
This script imports from several other files for additional functionality: archivematicaCreateMETSMetadataCSV archivematicaCreateMETSRights archivematicaCreateMETSRightsDspaceMDRef archivematicaCreateMETSTrim
On reingest, it short-circuits and runs archivematicaCreateMETSReingest to update the METS file instead.
storeAIP_v0.0
- Purpose: Send the completed AIP to the storage service
- Script: storeAIP.py
- Used in: SIP
Sends the currently processing AIP to the storage service. The Location is selected from the list of AIP Storage Locations associated with the Pipeline in previous tasks.
- Purpose:
- Script: [1]
- Used in:
- Tests:
Config File
Several config settings are read from /etc/archivematica/MCPClient/clientConfig.conf
on startup.
Variables in the MCPClient section:
Variable | Description | Default value |
---|---|---|
MCPArchivematicaServer | URL of the MCP gearman server. Must match the server config file. | localhost:4730 |
sharedDirectoryMounted | Directory structure owned by Archivematica and shared between the MCPServer & MCPClient. Must match the server config file. | /var/archivematica/sharedDirectory/ |
archivematicaClientModules | Path to the list of jobs to register with Gearman | /usr/lib/archivematica/MCPClient/archivematicaClientModules |
clientScriptsDirectory | Path to the directory where client scripts are installed. Used when parsing archivematicaClientModules | /usr/lib/archivematica/MCPClient/clientScripts/ |
LoadSupportedCommandsSpecial | Whether or not to register the SupportedCommandsSpecial section of archivematicaClientModules. This should be removed. | True |
numberOfTasks | Number of MCPClient workers to created. 0 detects the number of cores and uses that. | 0 |
elasticsearchServer | URL of the ElasticSearch server. | localhost:9200 |
disableElasticsearchIndexing | If true, do not index AIPs or Transfers in backlog. This should be removed, since ElasticSearch indexing is required | False |
temp_dir | Path to the temporary usage directory. Should be in the shared directory | /var/archivematica/sharedDirectory/tmp |
kioskMode | Dashboard setting that disables editing users. This should be removed, or at least moved to dashboard settings | False |
removableFiles | List of filenames that are not archivally significant and can be removed. | Thumbs.db, Icon, Icon\r, .DS_Store |
django_settings_module | Name of the Django settings module, so the client scripts can access the database via the Django ORM. | settings.common |