Difference between revisions of "MCPServer"

From Archivematica
Jump to navigation Jump to search
(Start of MCP documentation.)
 
(46 intermediate revisions by 6 users not shown)
Line 1: Line 1:
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > MCP
+
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > MCP Server
  
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
<div class="status">
+
The MCP Server is the core of the Archivematica system. It controls the various [[micro-services]] in the Archivematica system.  Configuration and processing information are held in the database. The user monitors and controls the status & workflow via the [[ dashboard ]]. The MCP Server maintains a log of all completed work in the database and log files.
<div>
 
Design
 
<div class="description">
 
This page proposes a new feature and reviews design options
 
</div>
 
</div><div class="active">
 
Development
 
<div class="description">
 
This page describes a feature that's in development
 
</div>
 
</div><div>
 
Documentation
 
<div class="description">
 
This page documents an implemented feature
 
</div>
 
</div>
 
</div>
 
  
==Overview==
+
Microservices are run in chains. On startup, Archivematica checks all the watched directories it is aware of.  For each file in those directories, it creates a Job Chain and an associated unit (SIP, DIP or Transfer). The Job Chain runs until completion, which usually puts the Transfer/SIP in another directory, to start another chain.
The MCP is the Archivematica [[micro-services]] tool to control flow in the Archivematica system. It "knows" what things need to be done, who can do them, what is currently being processed across the distributed system, and is responsible for distributing the work. The user controls and monitors the MCP via the [[ dashboard ]]. The MCP maintains a log of all completed work.
 
  
The MCP is a client server based architecture. The clients are relatively "dumb". They inform the server what tasks they can perform, and wait for the server to assign them a task.
+
The MCP uses [http://gearman.org gearman] as a task manager. The [[MCPClient | MCP Clients]] are relatively "dumb". They are gearman worker implementations, that inform the gearman server what tasks they can perform, and wait for the server to assign them a task.
  
The system relies on client and server having access to the same directory, to process the commands.
+
The Archivematica system relies on client and server having access to the same directory, to process the commands. On a distributed system, this is done through the shared directory.
  
==Server==
+
All files for the MCPServer are found in ''src/MCPServer/lib/'', and all files for the [[MCPClient]] are found in ''src/MCPClient/lib/''.
The server is the core of the MCP. It uses a set of modules, created at run time based on configurations (mcpModulesConfigs). These configs specify a directory to watch, and a series of commands to execute on anything placed in the directory.
 
  
===Server Implementation===
 
As stated above, the MCP watches directories specified in the modules, and has a set of commands to issue on items placed in those folders. Those commands will not be performed by the MCP server itself, but rather delegated to a client.
 
  
===mcp Modules===
+
== MCP Server ==
The mcp Modules are XML based, and contain a number of fields:
 
<module>
 
Wait for user approval, before creating and assigning tasks.
 
<requiresUserApproval>Yes/No</requiresUserApproval>
 
Description to give user to approve Job.(not implemented yet)
 
<descriptionForApproval></descriptionForApproval>
 
<notificationStarted></notificationStarted>
 
<notificationCompletedWithoutErrors></notificationCompletedWithoutErrors>
 
<notificationCompletedWithErrors></notificationCompletedWithErrors>
 
 
 
<directories>
 
Directory to watch for folders moved to.
 
  <watchDirectory>%watchDirectoryPath%appraiseSIP</watchDirectory>
 
Standard directory to move to while the clients are performing tasks. Defined in [[ archivematica.conf ]].
 
  <processingDirectory>%processingDirectory%</processingDirectory>
 
The output directory is determined by the return value of the commands, and the corresponding config folder.
 
Chaining output folders and watch directories allows for flow through the system.
 
  <successDirectory>%watchDirectoryPath%...</successDirectory>
 
  <failureDirectory>%watchDirectoryPath%failed/</failureDirectory>
 
</directories>
 
  
 +
The MCPServer tracks directories defined in '''WatchedDirectories'''.  When it sees new files in a watched directory, it starts the associated '''MicroServiceChains''' which knows the first '''MicroServiceChainLinks''' to run.
  
  <commands>
+
The workflow of Archivematica is defined by the '''MicroServiceChainLinks''' table. These, in conjunction with '''TasksConfigs''' and '''StandardTasksConfigs''', determine what script to run and how to run it. '''StandardTasksConfigs''' lists the script name and parameters. '''TasksConfigs''' defines how the script will be run (once for the SIP, once per file, etc. See [[#TaskTypes]] for more details) and what text is displayed in the dashboard. '''MicroServiceChainLinks''' and '''MicroServiceChainLinksExitCodes''' control what link is run next.   
A command to execute on each file or folder, the details of which are described below in 'mcp Modules Command'.
 
The exe command is the command to run on the Event.
 
<exeCommand>
 
The verification command is used if the return of the exeCommand is not reliable to determin the result of the command. (IE virus scan exits zero and logs there is a virus)
 
<verificationCommand>
 
Cleanup allows for some post processing before the Event folder is moved to the success or fail directory.
 
<cleanupSuccessfulCommand>
 
<cleanupUnsuccessfulCommand>
 
  </commands>
 
</module>
 
  
 +
When a link is run, it returns an exit code. If the exit code is listed in '''MicroServiceChainLinksExitCodes''' for that link, it goes to '''MicroServiceChainLinksExitCodes.nextMicroServiceChainLink'''. If not, the next link run is '''MicroServiceChainLinks.defaultNextChainLink'''
 +
Because of this, defaultNextChainLink is often the failure case. If the defaultNextChainLink is the 'Failed Transfer' chain (starting link: ''61c316a6-0a50-4f65-8767-1f44b1eeb6dd'') or the 'Failed SIP' (starting link: ''7d728c39-395f-4892-8193-92f086c0546f'') chain, then the SIP has 'failed' from a user perspective.  Not all MicroServiceChainLinks are considered critical, and sometimes the '''MicroServiceChainLinks.defaultNextChainLink''' is the same as the '''MicroServiceChainLinksExitCodes.nextMicroServiceChainLink'''.
  
====mcp Modules Command====
+
When a link is run for a particular Transfer or SIP, '''Jobs''' and sometimes '''Tasks''' are created to track this. A '''Jobs''' represents the execution of a '''MicroServiceChainLinks'''.  Most task types run no client scripts are are strictly for workflow control, so they have no '''Tasks'''.  Three task types ([[MCP/TaskTypes#Run_once|run once]], [[MCP/TaskTypes#Run_for_each_file|run for each file]], and [[MCP/TaskTypes#Generate_User_Choice_in_MicroService|generate user choice in microservice]]) run a client script and track the output in '''Tasks''' that link back to the same Job.  These task types are the majority of Jobs run.  When you view details for the Normalization link, for example, you're looking at the normalization Job which aggregates data from all the Tasks that were run.  All the Tasks can run in parallel (up to the number of MCPClients that are running), but a Job won't complete until all of its Tasks are done. When that happens, it will move on to the next link.
Each command consists of a number of parts.
+
 
  <command>
+
=== Startup ===
 +
* ''archivematicaMCP.py'' looks at all watched directories in '''WatchedDirectories'''
 +
** table '''WatchedDirectories''', file ''src/MCPServer/lib/archivematicaMCP.py''
 +
* for each file in each directory, create unit (''SIP'', ''DIP'', ''Transfer'') and job chain
 +
** functions ''watchDirectories'', ''createUnitAndJobChainThreaded'', ''createUnitAndJobChain''
 +
* See below for how a job chain works
 +
 
 +
=== Tasks Workflow/Job Chains ===
 +
* ''jobChain.py'' looks up its first link in '''MicroServiceChains.startingLink''' and starts it
 +
* ''jobChainLink.py'' finds its task in '''MicroServiceChainLinks.currentTask'''
 +
* ''jobChainLink.py'' continues and looks up what type of task it is in '''TasksConfigs.taskType''', instantiates a ''LinkTaskManager*.py'' class based on that and passes class-specific info from '''TasksConfigs.taskTypePKReference'''
 +
* Most ''LinkTaskManager*.py'' look up configuration for task in '''StandardTasksConfigs''' using '''TasksConfigs.taskTypePKReference''', create the actual task and hands it to Gearman to run
 +
** Replacement variables (%foo%) are replaced in the ''LinkTaskManager*.py''.  Replacements are defined in ''dicts.py'' in archivematicaCommon, and in '''MicroServiceChoiceReplacementDic.replacementDic'''. See [[#Replacement Variables]] for details.
 +
** Other ''LinkTaskManager*.py'' use '''TasksConfigs.taskTypePKReference''' as a foreign key to another table, do not use it at all.  See [[#TaskTypes]] section and [[#Database Reference]] for more information
 +
* Gearman runs the tasks, and collects the exit code, stdout and stderr, and calls back to ''LinkTaskManager*.py.taskCompletedCallBackFunction'', which calls ''jobChainLink.py.linkProcessingComplete''
 +
* ''jobChainLink.py'' looks in '''MicroServiceChainLinksExitCodes''' for the pairing of MicroServiceChainLink and exitCode, to find nextMicroServiceChainLink
 +
** If nothing is found, use '''MicroServiceChainLinks.defaultNextChainLink'''
 +
* ''jobChain.py'' gets the UUID of the next MicroServiceChainLinks, and starts the next ''jobChainLink'' (see above)
 +
 
 +
===Task Types===
 +
This has been moved to [[/TaskTypes | a task types sub page]]
 +
 
 +
 
 +
== Database Reference ==
 +
 
 +
=== Database Schema Diagram===
 +
The MCP Modules are configured in the database, with the following schema.  (Generated using mysql workbench sudo apt-get install mysql-workbench)
 +
 
 +
[[File:MCP_configuration_database_schema.png]]
 +
 
 +
=== MicroServiceChains ===
 +
* '''Intent''': Entry point into chains
 +
* '''Knows''': Description to display (if the user has to choose it), starting chain link
 +
* '''Referenced by''': MicroServiceChainChoice, WatchedDirectories,
 +
* startingLink is foreign key to MicroServiceChainLinks
 +
* configuration for '[[/TaskTypes#Get User Choice - select chain|get user choice to proceed with]]'
 +
 
 +
=== MicroServiceChainLinks ===
 +
* '''Intent''': The task, when/how to do it, and where to go next.  Often referenced
 +
* '''Knows''': currentTask, default/failed next link
 +
* '''Referenced by''': Jobs, MicroServiceChainChoice, MicroServiceChainLinks (itself), MicroServiceChainLinksExitCodes, MicroServiceChains, MicroServiceChoiceReplacementDic, SIPs, Transfers
 +
* currentTask is foreign key to TasksConfigs
 +
 
 +
=== TasksConfigs ===
 +
* aka "the weird table"
 +
* '''Intent''': Starting point to find configs for a link
 +
* '''Knows''': taskType, taskTypePKReference (aka class specific information), Job description presented to user
 +
* '''Referenced by''': MicroServiceChainLinks
 +
* taskType is a foreign key to TaskTypes
 +
* taskTypePKReference is semantically a foreign key to another table (determined by taskType) (often StandardTasksConfigs)
 +
* taskType determines what table to look at, taskTypePKReference is semantically foreign key to a row in the specificed table
 +
 
 +
=== TaskTypes ===
 +
* '''Intent''': Define all the ways tasks can be run/generic what they do, eg. run once, run once for each file, give user choices etc
 +
*  '''Knows''': N/A
 +
* '''Referenced by''': TasksConfigs
 +
* Each TaskType is associated with a linkTaskManager* class
 +
* [[/TaskTypes | More detail ]]
 +
 
 +
=== MicroServiceChainLinksExitCodes ===
 +
* '''Intent''': Show which ChainLink to execute next
 +
* '''Knows''': chainLink associated with it, an exit code, where to go with that
 +
* '''Referenced by''': None
 +
* microServiceChainLink is a foreign key to MicroServiceChainLinks
 +
* nextMicroServiceChainLink is also a foreign key to MicroServiceChainLinks
 +
* Usually there's only one exit code defined for a MicroServiceChainLinks (usually completed successfully), but can also be used to branch workflow depending on result (exit code) of a task.
 +
 
 +
=== StandardTasksConfigs ===
 +
* '''Intent''': Configurations for common tasks
 +
* '''Knows''': arguments, what to execute
 +
* '''Referenced by''': None
 +
* configuration for
 +
** '[[/TaskTypes#Run once |one instance]]' (''LinkTaskManagerDirectories.py'')
 +
** '[[/TaskTypes#Run for each file |for each file]]', (''LinkTaskManagerFiles.py'')
 +
** '[[/TaskTypes#Generate User Choice in MicroService |Get microservice generated list in stdOut]]' (''linkTaskManagerGetMicroserviceGeneratedListInStdOut.py'')
 +
** '[[/TaskTypes#Get User Choice - select from MicroService Generated List |Get user choice from microservice generated list]]' (''linkTaskManagerGetUserChoiceFromMicroserviceGeneratedList.py'')
 +
** '[[/TaskTypes#Transcoding |Split Job into many links based on file ID]]' (''linkTaskManagerSplitOnFileIdAndruleset'')
 +
 
 +
=== WatchedDirectories ===
 +
* '''Intent''': Knows what directories to watch, and what to do when something happens
 +
* '''Knows''': watched path, what to do, expected type=unit (eg. SIP, DIP, transfer)
 +
* '''Referenced by''': None
 +
* chain is a foreign key to MicroServiceChains
 +
* expectedType is a foreign key to WatchedDirectoriesExpectedTypes
 +
 
 +
=== MicroServiceChainChoice ===
 +
* '''Intent''': Configuration for taskType 'get user choice to proceed with', selection of a chain to process
 +
* '''Knows''': Chain Link that the choice comes from, chain to go to if selected
 +
* '''Referenced by''': None
 +
* choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
 +
* chainAvailable is a foreign key to MicroServiceChains
 +
* configuration for '[[/TaskTypes#Get User Choice - select chain |get user choice to proceed with]]' (''linkTaskManagerChoice.py'')
 +
 
 +
=== MicroServiceChoiceReplacementDic ===
 +
* '''Intent''': Configuration for taskType 'get replacement dic from user choice'
 +
* '''Knows''': Chain Link that the choice comes from, text to display, and a dict of {replacement_string : code_thing }
 +
* '''Referenced by''': None
 +
* choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
 +
* configuration for '[[/TaskTypes#Get User Choice - select replacement dict |get replacement dic from user choice]]' (''linkTaskManagerReplacementDicFromChoice.py'')
 +
 
 +
=== TasksConfigsSetUnitVariable ===
 +
* '''Intent''': Configuration for taskType 'linkTaskManagerSetUnitVariable'
 +
* '''Knows''': Variable name, value to set for variable (either variableValue or microServiceChainLink)
 +
* '''Referenced by''': None
 +
* microServiceChainLink is a foreign key to MicroServiceChainLinks
 +
* configuration for '[[/TaskTypes#Set Unit Variable |linkTaskManagerSetUnitVariable]]' (''linkTaskManagerSetUnitVariable.py'')
 +
 
 +
=== TasksConfigsUnitVariableLinkPull ===
 +
* '''Intent''': Configuration for taskType 'linkTaskManagerUnitVariableLinkPull'
 +
* '''Knows''': Variable name, default MicroServiceChainLink to go to if no variable found
 +
* '''Referenced by''': None
 +
* defaultMicroServiceChainLink is a foreign key to MicroServiceChainLinks
 +
* configuration for '[[/TaskTypes#Get Unit Variable MicroServiceChainLink |linkTaskManagerUnitVariableLinkPull]]' (''linkTaskManagerUnitVariableLinkPull.py'')
 +
 
 +
=== TasksConfigsStartLinkForEachFile ===
 +
* '''Intent''': Configuration for taskType 'Split Job into many links based on file ID'
 +
* '''Knows''': Normalization chain to run, folder to run on
 +
* '''Referenced by''': None
 +
* execute is a foreign key to MicroServiceChains
 +
* configuration for '[[/TaskTypes#Transcoding |Split Job into many links based on file ID]]' (''linkTaskManagerSplitOnFileIdAndruleset.py'')
 +
 
 +
=== TasksConfigsAssignMagicLink - Deprecated ===
 +
* '''Intent''': Configuration for taskType 'assign magic link'
 +
* '''Knows''': MicroServiceChainLink to run next time Load Magic Link is called for that unit.
 +
* '''Referenced by''': None
 +
* execute is a foreign key to MicroServiceChainLinks
 +
* configuration for '[[/TaskTypes#Set Magic Link - Deprecated |assign magic link]]' ('linkTaskManagerAssignMagicLink'.py'')
 +
 
 +
== Replacement Variables ==
 +
 
 +
In '''StandardTasksConfigs''', the arguments field can be populated with placeholder values structured like %name%.  When the script is run, these values are populated with the actual value. For example, <code>%SIPUUID%</code> is replaced with the actual SIP UUID value <code>962c7299-6e37-4e67-acdf-800c5b6fbee4</code>
 +
 
 +
The job types that perform a replacement on the values are [[/TaskTypes#Run once |linkTaskManagerDirectories]] (unit values), [[/TaskTypes#Run for each file |linkTaskManagerFiles]] (file values) and [[/TaskTypes#Generate User Choice in MicroService |linkTaskManagerGetMicroserviceGeneratedListInStdOut]] (unit values).  These values are populated by <code>src/archivematicaCommon/lib/dicts.py</code> in <code>ReplacementDict.frommodel</code>
 +
 
 +
Values always available:
 +
 
 +
{|
 +
|-
 +
! Variable !! Value source !! Example value
 +
|-
 +
| %processingDirectory% || MCPServer config "processingDirectory" || /var/archivematica/sharedDirectory/currentlyProcessing/
 +
|-
 +
| %watchDirectoryPath% || MCPServer config "watchDirectoryPath" || /var/archivematica/sharedDirectory/watchedDirectories/
 +
|-
 +
| %rejectedDirectory% || MCPServer config "rejectedDirectory" || /var/archivematica/sharedDirectory/rejected/
 +
|-
 +
|}
 +
 
 +
Values available for a unit ('''SIP''', '''DIP''' or '''Transfer''') in [[/TaskTypes#Run once |linkTaskManagerDirectories]] and [[/TaskTypes#Generate User Choice in MicroService |linkTaskManagerGetMicroserviceGeneratedListInStdOut]]:
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Example value
 +
|-
 +
| %currentPath% || Path to the unit || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/
 +
|-
 +
| %SIPDirectory% || Duplicate of %currentPath% || Same as %currentPath%
 +
|-
 +
| %SIPDirectoryBasename% || Directory name of the unit || csvmd-6008a7ee-6585-47cc-abf2-387bde530fef
 +
|-
 +
| %SIPLogsDirectory% || Path to the logs directory || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/logs/
 +
|-
 +
| %SIPName% || Name of the unit (directory name with UUID stripped) || csvmd
 +
|-
 +
| %SIPObjectsDirectory% || Path to the objects directory || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/
 +
|-
 +
| %SIPUUID% || UUID of the unit || 6008a7ee-6585-47cc-abf2-387bde530fef
 +
|-
 +
| %relativeLocation% || Duplicate of %currentPath% || Same as %currentPath%
 +
|-
 +
| %transferDirectory% || Duplicate of %currentPath% (SIP only) || Same as %currentPath%
 +
|-
 +
|}
 +
 
 +
Values available for a '''File''' in [[/TaskTypes#Run for each file |linkTaskManagerFiles]] include all Unit values:
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Example value
 +
|-
 +
| %currentLocation% || Current path to file, usually sanitized || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
 +
|-
 +
| %fileDirectory% || Directory the file is in || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects
 +
|-
 +
| %fileGrpUse% || File group, eg original, preservation, etc || original
 +
|-
 +
| %fileUUID% || File UUID from Files table || 3e071275-e6c5-40e5-85e4-ed0cf83006cd
 +
|-
 +
| %originalLocation% || Original file path. May not be unicode. || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
 +
|-
 +
| %relativeLocation% || Duplicate of %currentLocation% || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/
 +
|-
 +
|}
 +
 
 +
'''File'''s also have the following available in FPR scripts.
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Example value
 +
|-
 +
| %fileExtension% || File extension || TGA
 +
|-
 +
| %fileExtensionWithDot% || File extension, with dot || .TGA
 +
|-
 +
| %fileFullName% || Full path to file || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
 +
|-
 +
| %fileName% || File basename, sans extension || MARBLES
 +
|-
 +
| %inputFile% || Full path to file || /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
 +
|-
 +
|}
 +
 
 +
Temporary replacement variables can also be created based on user choices with [[/TaskTypes#Get User Choice - select from MicroService Generated List |get user choice from microservice generated list]].  The '''MicroServiceChoiceReplacementDic''' table contains replacement dicts that are made available to subsequent jobs for replacement.  Examples include:
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Example value
 +
|-
 +
| %IDCommand% || UUID of an FPR command for file identification || a8e45bc1-eb35-4545-885c-dd552f1fde9a
 +
|-
 +
| %AIPCompressionLevel% || Level of compression for an AIP || 5
 +
|-
 +
| %AIPCompressionAlgorithm% || Algorithm to use for compressing an AIP || 7z-bzip2, 7z-lzma, None-
 +
|-
 +
|}
 +
 
 +
== Config File ==
 +
 
 +
Several basic startup settings are read from the config file at <code>/etc/archivematica/MCPServer/serverConfig.conf</code>.
 +
 
 +
Variables in the MCPServer section:
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Default value
 +
|-
 +
| MCPArchivematicaServer || URL of the MCP gearman server. Must match the client config file. || localhost:4730
 +
|-
 +
| GearmanServerWorker || (uncertain) URL of the MCP gearman server for RPCServer. Should match MCPArchivematicaServer || localhost:4730
 +
|-
 +
| sharedDirectory || Directory structure owned by Archivematica and shared between the MCPServer & MCPClient. Must match the client config file. || /var/archivematica/sharedDirectory/
 +
|-
 +
| processingDirectory || Path where units during processing live. Should be inside sharedDirectory || /var/archivematica/sharedDirectory/currentlyProcessing/
 +
|-
 +
| rejectedDirectory || Path where rejected units live. Should be inside sharedDirectory. || /var/archivematica/sharedDirectory/rejected/
 +
|-
 +
| watchDirectoryPath || Path where watched directories live. Should be inside sharedDirectory. || /var/archivematica/sharedDirectory/watchedDirectories/
 +
|-
 +
| watchDirectoriesPollInterval || Time in seconds to wait between polling the watched directories to check for updates. || 1
 +
|-
 +
| waitOnAutoApprove || Time in seconds to wait before approving based on the processing XML file. || 0
 +
|-
 +
| processingXMLFile || Name of the workflow configuration file. || processingMCP.xml
 +
|-
 +
|}
 +
 
 +
Variables in the Protocol section:
 +
 
 +
{|
 +
|-
 +
! Variable !! Description !! Default value
 +
|-
 +
| limitGearmanConnections || Maximum number of concurrent gearman connections. || 10000
 +
|-
 +
| limitTaskThreads ||  || 75
 +
|-
 +
| limitTaskThreadsSleep ||  || 0.2
 +
|-
 +
| reservedAsTaskProcessingThreads ||  || 8
 +
|-
 +
|}
 +
 
 +
== Old FPR Database Reference ==
 +
 
 +
{| class="wikitable" style="background-color:#ffcccc; font-size: 120%; font-weight: bold; " cellpadding="10"
 +
| This is not accurate in Archivematica 1.0 or greater - file identification and normalization rules have been moved to the new FPR schema.
 +
|}
 +
 
 +
=== FileIDsBySingleID ===
 +
* '''Intent''': Map a tool and its output to a FileID (file format)
 +
* '''Knows''': Tool, tool version, tool output, what FileID that corresponds to
 +
* '''Referenced By''': None
 +
* fileID is a foreign key to FileIDs
 +
 
 +
=== FileIDs ===
 +
* '''Intent''': Describe a file format
 +
* '''Knows''': valid for preservation or access, description, how the file was identified
 +
* '''Referenced By''': CommandRelationships, FileIDsBySingleID, FilesIdentifiedIDs
 +
* fileIDType is a foreign key to FileIDTypes
 +
 
 +
=== FileIDTypes ===
 +
* '''Intent''': Stores a list of all the ways/tools a file can be identified by
 +
* '''Knows''': All the ways a file can be identified
 +
* '''Referenced By''': FileIDs
 +
* Small, ~13 entries
 +
 
 +
=== FilesIdentifiedIDs ===
 +
* '''Intent''': Mapping between File and File ID
 +
* '''Knows''': File, FileID
 +
* '''Referenced By''': None
 +
* fileUUID is a foreign key to Files
 +
* fileID is a foreign key to FileIDs
 +
 
 +
=== Files ===
 +
* '''Intent''': Information about a file
 +
* '''Knows''': UUID, original and current location, SIP or Transfer UUID, etc
 +
* '''Referenced By''': FilesIDs, FilesIdentifiedIDs
 +
 
 +
=== CommandRelationships ===
 +
* aka "Format Policy Rule"
 +
* '''Intent''': Map between file ID, command classification and command
 +
* '''Knows''': fileID, commandClassification, Command, statistics on success/failure
 +
* '''Referenced By''': None
 +
* Effectively has a three part primary key - fileID, commandClassification and command
 +
* fileID is a foreign key to FileIDs
 +
* commandClassification is a foreign key to CommandClassifications
 +
* command is a foreign key to Commands
 +
 
 +
=== Commands ===
 +
* '''Intent''': Information about the command
 +
* '''Knows''': command itself, verification and event detail commands, output location and format
 +
* '''Referenced By''':
 +
* commandType is a foreign key to CommandTypes
 +
* eventDetailCommand and verificationCommand are foreign keys to Commands (itself)
 +
 
 +
=== CommandTypes ===
 +
* '''Intent''': Type of command being run
 +
* '''Knows''': All possible types of commands
 +
* '''Referenced By''': CommandRelationships
 +
* Small, ~3 entries (bashScript, pythonScript, command [line])
 +
 
 +
=== CommandClassifications ===
 +
* '''Intent''': Classification (access, preservation etc) of the command being run
 +
* '''Knows''': All possible classifications
 +
* '''Referenced By''':
 +
* Small, ~4 entries - thumbnail, access, preservation, extraction
 +
** access: 3141bc6f-7f77-4809-9244-116b235e7330
 +
** preservation: 3d1b570f-f500-4b3c-bbbc-4c58aad05c27
 +
** thumbnail: 27c2969b-b6a0-441d-888d-85292b692064
 +
 
 +
=== DefaultCommandsForClassifications ===
 +
* '''Intent''': Default action for a given classification
 +
* '''Knows''': command classification, chain link to run
 +
* '''Referenced By''':
 +
* forClassification is a foreign key to CommandClassifications
 +
* MicroserviceChainLink is a foreign key to MicroserviceChainLinks
 +
 
 +
==Debugging==
 +
 
 +
Archivematica logs are stored in <code>/var/log/archivematica/</code>, and separated by project. All logs are rotated and old logs are deleted automatically.
 +
 
 +
=== Dashboard===
 +
* '''Location''': <code>/var/log/archivematica/dashboard</code>
 +
* '''Contains''': Logging from Django, GUI, AJAX calls, start transfer, proxying to storage service, etc
 +
* '''Config''': <code>archivematica/src/dashboard/src/settings/common.py</code>
 +
* '''Files''':
 +
** <code>dashboard.log</code>: INFO and higher logs
 +
** <code>dashboard.debug.log</code>: DEBUG logging for above
 +
 
 +
=== MCPClient ===
 +
* '''Location''': <code>/var/log/archivematica/MCPClient</code>
 +
* '''Contains''': Logging from the MCPClient and client scripts
 +
* '''Config''':
 +
** <code>archivematica/src/MCPClient/lib/archivematicaClient.py</code> for MCPClient
 +
** <code>archivematica/src/archivematicaCommon/lib/custom_handlers.py</code> for client_scripts.log
 +
* '''Files''':
 +
** <code>MCPClient.log</code>: Logging from MCPClients that listen to gearman and run client scripts
 +
** <code>MCPClient.debug.log</code>: DEBUG logging for above
 +
** <code>client_scripts.log</code>: Logging from the client scripts themselves
 +
 
 +
=== MCPServer ===
 +
* '''Location''': <code>/var/log/archivematica/MCPServer</code>
 +
* '''Contains''': Logging from the MCPServer, MicroServiceChainLinks, sending jobs to gearman
 +
* '''Config''': <code>archivematica/src/MCPServer/lib/archivematicaMCP.py</code>
 +
* '''Files''':
 +
** <code>MCPServer.log</code>: INFO and higher logs
 +
** <code>MCPServer.debug.log</code>: DEBUG logging for above
 +
 
 +
=== Storage Service ===
 +
* '''Location''': <code>/var/log/archivematica/storage-service</code>: Logging from the storage service.  This might not be installed on the same machine
 +
* '''Contains''': Logging from the storage service
 +
* '''Config''': Storage Service <code>archivematica-storage-service/storage_service/storage_service/settings/base.py</code>
 +
* '''Files''':
 +
** <code>storage_service.log</code>: INFO and higher logs
 +
** <code>storage_service_debug.log</code>: DEBUG logging for above
 +
 
 +
=== Prior to 1.4 ===
 +
 
 +
'''WARNING This is deprecated.  See above for logging in Archivematica 1.4 and later.'''
 +
 
 +
Debugging the MCP can be a difficult task.
 +
Logs can be large, and are placed in the /tmp/ directory, so they are automatically removed upon reboot.
 +
 
 +
====Parsing Logs====
 +
Here are some commands to help parse logs:
 +
<pre>grep "DEBUG type=\"archivematicaMCP\"" -v /tmp/archivematicaMCPServer* -h > /tmp/archivematicaOutput.txt </pre>
 +
Removes the periodic debug message prints.
 +
<pre>grep "Traceback (most recent call last):" /tmp/archivematicaOutput.txt  -n</pre>
 +
<pre>grep -i EXCEPTION /tmp/archivematicaMCPServer-* -n</pre>
 +
-n will prepend the line number
 +
<pre>sed -n '302092,+50'p /tmp/archivematicaMCPServer-*</pre>
 +
prints 50 lines from the file, including line number 302092. This is useful to look at sections of the log that have exceptions, which can be found with the command above.
 +
 
 +
====debugging tools====
 +
In extreme cases, you can setup your dev enviroment, so you log in as the archivematica user, and use eclipse with pyDev in debug mode, to run the MCP.
 +
 
 +
====what clients are connected====
 +
<pre>python -c '
 +
import gearman
 +
admin = gearman.admin_client.GearmanAdminClient(host_list=["127.0.0.1"])
 +
for client in admin. get_workers():
 +
    if client["client_id"] != "-": #exclude server task connections
 +
        print client["client_id"], client["ip"]
 +
 
 +
for stat in admin.get_status():
 +
    if stat["running"] != 0 or stat["queued"] != 0:
 +
        print stat
 +
'
 +
</pre>
 +
 
 +
 
 +
====Waching activity====
 +
<pre>tail /tmp/archivematicaMCP* -f</pre>
 +
<pre>watch mysql -u root MCP --execute "\"SELECT * FROM Tasks WHERE endTime = 0;\""</pre>
 +
 
 +
====Turning on printing all sql queries====
 +
sudo nano /usr/lib/archivematica/archivematicaCommon/databaseInterface.py
 +
:http://code.google.com/p/archivematica/source/browse/tags/release-0.8-alpha/src/archivematicaCommon/lib/databaseInterface.py
 +
:edit lines 34 and 73
 +
:"printSQL = False"  -> printSQL = True
 +
:"        print printSQL" -> "        print sql"
 +
 
 +
This will cause archivematica to print ALL of it's queries issues to the database.
  
<descriptionWhileExecuting> </descriptionWhileExecuting>
 
To skip this command, and not execute it at all.
 
<skip>Yes/No</skip>
 
Filter what files/folder the command will operate on.
 
<filterFileEnd></filterFileEnd>
 
<filterFileStart></filterFileStart>
 
<filterSubDir></filterSubDir>
 
If the output of the command is all going to the same file, multiple threads may try to write to the same file simultaneously and cause a collision. Set this option to stop that occurrence.
 
<requiresOutputLock>No</requiresOutputLock>
 
Not used - placeholder
 
<standardIn></standardIn>
 
File to write standard out to.
 
<standardOut></standardOut>
 
File to write standard error to.
 
<standardError></standardError>
 
  
<failureNotification> </failureNotification>
 
Define the command the client is to execute. This will need to match an entry in the client's supported modules defined int [[ archivematicaClientConfig ]]
 
<execute> </execute>
 
The arguments to give the command.
 
<arguments> </arguments>
 
Does this command execute once on the SIP, or once for each file?
 
<executeOnEachFile>yes/no</executeOnEachFile>
 
</command>
 
  
==Client==
 
===Client Requirements===
 
===Client Implementation===
 
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]

Latest revision as of 15:58, 11 February 2020

Main Page > Development > Development documentation > MCP Server

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

The MCP Server is the core of the Archivematica system. It controls the various micro-services in the Archivematica system. Configuration and processing information are held in the database. The user monitors and controls the status & workflow via the dashboard . The MCP Server maintains a log of all completed work in the database and log files.

Microservices are run in chains. On startup, Archivematica checks all the watched directories it is aware of. For each file in those directories, it creates a Job Chain and an associated unit (SIP, DIP or Transfer). The Job Chain runs until completion, which usually puts the Transfer/SIP in another directory, to start another chain.

The MCP uses gearman as a task manager. The MCP Clients are relatively "dumb". They are gearman worker implementations, that inform the gearman server what tasks they can perform, and wait for the server to assign them a task.

The Archivematica system relies on client and server having access to the same directory, to process the commands. On a distributed system, this is done through the shared directory.

All files for the MCPServer are found in src/MCPServer/lib/, and all files for the MCPClient are found in src/MCPClient/lib/.


MCP Server[edit]

The MCPServer tracks directories defined in WatchedDirectories. When it sees new files in a watched directory, it starts the associated MicroServiceChains which knows the first MicroServiceChainLinks to run.

The workflow of Archivematica is defined by the MicroServiceChainLinks table. These, in conjunction with TasksConfigs and StandardTasksConfigs, determine what script to run and how to run it. StandardTasksConfigs lists the script name and parameters. TasksConfigs defines how the script will be run (once for the SIP, once per file, etc. See #TaskTypes for more details) and what text is displayed in the dashboard. MicroServiceChainLinks and MicroServiceChainLinksExitCodes control what link is run next.

When a link is run, it returns an exit code. If the exit code is listed in MicroServiceChainLinksExitCodes for that link, it goes to MicroServiceChainLinksExitCodes.nextMicroServiceChainLink. If not, the next link run is MicroServiceChainLinks.defaultNextChainLink Because of this, defaultNextChainLink is often the failure case. If the defaultNextChainLink is the 'Failed Transfer' chain (starting link: 61c316a6-0a50-4f65-8767-1f44b1eeb6dd) or the 'Failed SIP' (starting link: 7d728c39-395f-4892-8193-92f086c0546f) chain, then the SIP has 'failed' from a user perspective. Not all MicroServiceChainLinks are considered critical, and sometimes the MicroServiceChainLinks.defaultNextChainLink is the same as the MicroServiceChainLinksExitCodes.nextMicroServiceChainLink.

When a link is run for a particular Transfer or SIP, Jobs and sometimes Tasks are created to track this. A Jobs represents the execution of a MicroServiceChainLinks. Most task types run no client scripts are are strictly for workflow control, so they have no Tasks. Three task types (run once, run for each file, and generate user choice in microservice) run a client script and track the output in Tasks that link back to the same Job. These task types are the majority of Jobs run. When you view details for the Normalization link, for example, you're looking at the normalization Job which aggregates data from all the Tasks that were run. All the Tasks can run in parallel (up to the number of MCPClients that are running), but a Job won't complete until all of its Tasks are done. When that happens, it will move on to the next link.

Startup[edit]

  • archivematicaMCP.py looks at all watched directories in WatchedDirectories
    • table WatchedDirectories, file src/MCPServer/lib/archivematicaMCP.py
  • for each file in each directory, create unit (SIP, DIP, Transfer) and job chain
    • functions watchDirectories, createUnitAndJobChainThreaded, createUnitAndJobChain
  • See below for how a job chain works

Tasks Workflow/Job Chains[edit]

  • jobChain.py looks up its first link in MicroServiceChains.startingLink and starts it
  • jobChainLink.py finds its task in MicroServiceChainLinks.currentTask
  • jobChainLink.py continues and looks up what type of task it is in TasksConfigs.taskType, instantiates a LinkTaskManager*.py class based on that and passes class-specific info from TasksConfigs.taskTypePKReference
  • Most LinkTaskManager*.py look up configuration for task in StandardTasksConfigs using TasksConfigs.taskTypePKReference, create the actual task and hands it to Gearman to run
    • Replacement variables (%foo%) are replaced in the LinkTaskManager*.py. Replacements are defined in dicts.py in archivematicaCommon, and in MicroServiceChoiceReplacementDic.replacementDic. See #Replacement Variables for details.
    • Other LinkTaskManager*.py use TasksConfigs.taskTypePKReference as a foreign key to another table, do not use it at all. See #TaskTypes section and #Database Reference for more information
  • Gearman runs the tasks, and collects the exit code, stdout and stderr, and calls back to LinkTaskManager*.py.taskCompletedCallBackFunction, which calls jobChainLink.py.linkProcessingComplete
  • jobChainLink.py looks in MicroServiceChainLinksExitCodes for the pairing of MicroServiceChainLink and exitCode, to find nextMicroServiceChainLink
    • If nothing is found, use MicroServiceChainLinks.defaultNextChainLink
  • jobChain.py gets the UUID of the next MicroServiceChainLinks, and starts the next jobChainLink (see above)

Task Types[edit]

This has been moved to a task types sub page


Database Reference[edit]

Database Schema Diagram[edit]

The MCP Modules are configured in the database, with the following schema. (Generated using mysql workbench sudo apt-get install mysql-workbench)

MCP configuration database schema.png

MicroServiceChains[edit]

  • Intent: Entry point into chains
  • Knows: Description to display (if the user has to choose it), starting chain link
  • Referenced by: MicroServiceChainChoice, WatchedDirectories,
  • startingLink is foreign key to MicroServiceChainLinks
  • configuration for 'get user choice to proceed with'

MicroServiceChainLinks[edit]

  • Intent: The task, when/how to do it, and where to go next. Often referenced
  • Knows: currentTask, default/failed next link
  • Referenced by: Jobs, MicroServiceChainChoice, MicroServiceChainLinks (itself), MicroServiceChainLinksExitCodes, MicroServiceChains, MicroServiceChoiceReplacementDic, SIPs, Transfers
  • currentTask is foreign key to TasksConfigs

TasksConfigs[edit]

  • aka "the weird table"
  • Intent: Starting point to find configs for a link
  • Knows: taskType, taskTypePKReference (aka class specific information), Job description presented to user
  • Referenced by: MicroServiceChainLinks
  • taskType is a foreign key to TaskTypes
  • taskTypePKReference is semantically a foreign key to another table (determined by taskType) (often StandardTasksConfigs)
  • taskType determines what table to look at, taskTypePKReference is semantically foreign key to a row in the specificed table

TaskTypes[edit]

  • Intent: Define all the ways tasks can be run/generic what they do, eg. run once, run once for each file, give user choices etc
  • Knows: N/A
  • Referenced by: TasksConfigs
  • Each TaskType is associated with a linkTaskManager* class
  • More detail

MicroServiceChainLinksExitCodes[edit]

  • Intent: Show which ChainLink to execute next
  • Knows: chainLink associated with it, an exit code, where to go with that
  • Referenced by: None
  • microServiceChainLink is a foreign key to MicroServiceChainLinks
  • nextMicroServiceChainLink is also a foreign key to MicroServiceChainLinks
  • Usually there's only one exit code defined for a MicroServiceChainLinks (usually completed successfully), but can also be used to branch workflow depending on result (exit code) of a task.

StandardTasksConfigs[edit]

WatchedDirectories[edit]

  • Intent: Knows what directories to watch, and what to do when something happens
  • Knows: watched path, what to do, expected type=unit (eg. SIP, DIP, transfer)
  • Referenced by: None
  • chain is a foreign key to MicroServiceChains
  • expectedType is a foreign key to WatchedDirectoriesExpectedTypes

MicroServiceChainChoice[edit]

  • Intent: Configuration for taskType 'get user choice to proceed with', selection of a chain to process
  • Knows: Chain Link that the choice comes from, chain to go to if selected
  • Referenced by: None
  • choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
  • chainAvailable is a foreign key to MicroServiceChains
  • configuration for 'get user choice to proceed with' (linkTaskManagerChoice.py)

MicroServiceChoiceReplacementDic[edit]

  • Intent: Configuration for taskType 'get replacement dic from user choice'
  • Knows: Chain Link that the choice comes from, text to display, and a dict of {replacement_string : code_thing }
  • Referenced by: None
  • choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
  • configuration for 'get replacement dic from user choice' (linkTaskManagerReplacementDicFromChoice.py)

TasksConfigsSetUnitVariable[edit]

  • Intent: Configuration for taskType 'linkTaskManagerSetUnitVariable'
  • Knows: Variable name, value to set for variable (either variableValue or microServiceChainLink)
  • Referenced by: None
  • microServiceChainLink is a foreign key to MicroServiceChainLinks
  • configuration for 'linkTaskManagerSetUnitVariable' (linkTaskManagerSetUnitVariable.py)

TasksConfigsUnitVariableLinkPull[edit]

  • Intent: Configuration for taskType 'linkTaskManagerUnitVariableLinkPull'
  • Knows: Variable name, default MicroServiceChainLink to go to if no variable found
  • Referenced by: None
  • defaultMicroServiceChainLink is a foreign key to MicroServiceChainLinks
  • configuration for 'linkTaskManagerUnitVariableLinkPull' (linkTaskManagerUnitVariableLinkPull.py)

TasksConfigsStartLinkForEachFile[edit]

  • Intent: Configuration for taskType 'Split Job into many links based on file ID'
  • Knows: Normalization chain to run, folder to run on
  • Referenced by: None
  • execute is a foreign key to MicroServiceChains
  • configuration for 'Split Job into many links based on file ID' (linkTaskManagerSplitOnFileIdAndruleset.py)

TasksConfigsAssignMagicLink - Deprecated[edit]

  • Intent: Configuration for taskType 'assign magic link'
  • Knows: MicroServiceChainLink to run next time Load Magic Link is called for that unit.
  • Referenced by: None
  • execute is a foreign key to MicroServiceChainLinks
  • configuration for 'assign magic link' ('linkTaskManagerAssignMagicLink'.py)

Replacement Variables[edit]

In StandardTasksConfigs, the arguments field can be populated with placeholder values structured like %name%. When the script is run, these values are populated with the actual value. For example, %SIPUUID% is replaced with the actual SIP UUID value 962c7299-6e37-4e67-acdf-800c5b6fbee4

The job types that perform a replacement on the values are linkTaskManagerDirectories (unit values), linkTaskManagerFiles (file values) and linkTaskManagerGetMicroserviceGeneratedListInStdOut (unit values). These values are populated by src/archivematicaCommon/lib/dicts.py in ReplacementDict.frommodel

Values always available:

Variable Value source Example value
%processingDirectory% MCPServer config "processingDirectory" /var/archivematica/sharedDirectory/currentlyProcessing/
%watchDirectoryPath% MCPServer config "watchDirectoryPath" /var/archivematica/sharedDirectory/watchedDirectories/
%rejectedDirectory% MCPServer config "rejectedDirectory" /var/archivematica/sharedDirectory/rejected/

Values available for a unit (SIP, DIP or Transfer) in linkTaskManagerDirectories and linkTaskManagerGetMicroserviceGeneratedListInStdOut:

Variable Description Example value
%currentPath% Path to the unit /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/
%SIPDirectory% Duplicate of %currentPath% Same as %currentPath%
%SIPDirectoryBasename% Directory name of the unit csvmd-6008a7ee-6585-47cc-abf2-387bde530fef
%SIPLogsDirectory% Path to the logs directory /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/logs/
%SIPName% Name of the unit (directory name with UUID stripped) csvmd
%SIPObjectsDirectory% Path to the objects directory /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/
%SIPUUID% UUID of the unit 6008a7ee-6585-47cc-abf2-387bde530fef
%relativeLocation% Duplicate of %currentPath% Same as %currentPath%
%transferDirectory% Duplicate of %currentPath% (SIP only) Same as %currentPath%

Values available for a File in linkTaskManagerFiles include all Unit values:

Variable Description Example value
%currentLocation% Current path to file, usually sanitized /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
%fileDirectory% Directory the file is in /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects
%fileGrpUse% File group, eg original, preservation, etc original
%fileUUID% File UUID from Files table 3e071275-e6c5-40e5-85e4-ed0cf83006cd
%originalLocation% Original file path. May not be unicode. /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
%relativeLocation% Duplicate of %currentLocation% /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/

Files also have the following available in FPR scripts.

Variable Description Example value
%fileExtension% File extension TGA
%fileExtensionWithDot% File extension, with dot .TGA
%fileFullName% Full path to file /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA
%fileName% File basename, sans extension MARBLES
%inputFile% Full path to file /var/archivematica/sharedDirectory/currentlyProcessing/csvmd-6008a7ee-6585-47cc-abf2-387bde530fef/objects/MARBLES.TGA

Temporary replacement variables can also be created based on user choices with get user choice from microservice generated list. The MicroServiceChoiceReplacementDic table contains replacement dicts that are made available to subsequent jobs for replacement. Examples include:

Variable Description Example value
%IDCommand% UUID of an FPR command for file identification a8e45bc1-eb35-4545-885c-dd552f1fde9a
%AIPCompressionLevel% Level of compression for an AIP 5
%AIPCompressionAlgorithm% Algorithm to use for compressing an AIP 7z-bzip2, 7z-lzma, None-

Config File[edit]

Several basic startup settings are read from the config file at /etc/archivematica/MCPServer/serverConfig.conf.

Variables in the MCPServer section:

Variable Description Default value
MCPArchivematicaServer URL of the MCP gearman server. Must match the client config file. localhost:4730
GearmanServerWorker (uncertain) URL of the MCP gearman server for RPCServer. Should match MCPArchivematicaServer localhost:4730
sharedDirectory Directory structure owned by Archivematica and shared between the MCPServer & MCPClient. Must match the client config file. /var/archivematica/sharedDirectory/
processingDirectory Path where units during processing live. Should be inside sharedDirectory /var/archivematica/sharedDirectory/currentlyProcessing/
rejectedDirectory Path where rejected units live. Should be inside sharedDirectory. /var/archivematica/sharedDirectory/rejected/
watchDirectoryPath Path where watched directories live. Should be inside sharedDirectory. /var/archivematica/sharedDirectory/watchedDirectories/
watchDirectoriesPollInterval Time in seconds to wait between polling the watched directories to check for updates. 1
waitOnAutoApprove Time in seconds to wait before approving based on the processing XML file. 0
processingXMLFile Name of the workflow configuration file. processingMCP.xml

Variables in the Protocol section:

Variable Description Default value
limitGearmanConnections Maximum number of concurrent gearman connections. 10000
limitTaskThreads 75
limitTaskThreadsSleep 0.2
reservedAsTaskProcessingThreads 8

Old FPR Database Reference[edit]

This is not accurate in Archivematica 1.0 or greater - file identification and normalization rules have been moved to the new FPR schema.

FileIDsBySingleID[edit]

  • Intent: Map a tool and its output to a FileID (file format)
  • Knows: Tool, tool version, tool output, what FileID that corresponds to
  • Referenced By: None
  • fileID is a foreign key to FileIDs

FileIDs[edit]

  • Intent: Describe a file format
  • Knows: valid for preservation or access, description, how the file was identified
  • Referenced By: CommandRelationships, FileIDsBySingleID, FilesIdentifiedIDs
  • fileIDType is a foreign key to FileIDTypes

FileIDTypes[edit]

  • Intent: Stores a list of all the ways/tools a file can be identified by
  • Knows: All the ways a file can be identified
  • Referenced By: FileIDs
  • Small, ~13 entries

FilesIdentifiedIDs[edit]

  • Intent: Mapping between File and File ID
  • Knows: File, FileID
  • Referenced By: None
  • fileUUID is a foreign key to Files
  • fileID is a foreign key to FileIDs

Files[edit]

  • Intent: Information about a file
  • Knows: UUID, original and current location, SIP or Transfer UUID, etc
  • Referenced By: FilesIDs, FilesIdentifiedIDs

CommandRelationships[edit]

  • aka "Format Policy Rule"
  • Intent: Map between file ID, command classification and command
  • Knows: fileID, commandClassification, Command, statistics on success/failure
  • Referenced By: None
  • Effectively has a three part primary key - fileID, commandClassification and command
  • fileID is a foreign key to FileIDs
  • commandClassification is a foreign key to CommandClassifications
  • command is a foreign key to Commands

Commands[edit]

  • Intent: Information about the command
  • Knows: command itself, verification and event detail commands, output location and format
  • Referenced By:
  • commandType is a foreign key to CommandTypes
  • eventDetailCommand and verificationCommand are foreign keys to Commands (itself)

CommandTypes[edit]

  • Intent: Type of command being run
  • Knows: All possible types of commands
  • Referenced By: CommandRelationships
  • Small, ~3 entries (bashScript, pythonScript, command [line])

CommandClassifications[edit]

  • Intent: Classification (access, preservation etc) of the command being run
  • Knows: All possible classifications
  • Referenced By:
  • Small, ~4 entries - thumbnail, access, preservation, extraction
    • access: 3141bc6f-7f77-4809-9244-116b235e7330
    • preservation: 3d1b570f-f500-4b3c-bbbc-4c58aad05c27
    • thumbnail: 27c2969b-b6a0-441d-888d-85292b692064

DefaultCommandsForClassifications[edit]

  • Intent: Default action for a given classification
  • Knows: command classification, chain link to run
  • Referenced By:
  • forClassification is a foreign key to CommandClassifications
  • MicroserviceChainLink is a foreign key to MicroserviceChainLinks

Debugging[edit]

Archivematica logs are stored in /var/log/archivematica/, and separated by project. All logs are rotated and old logs are deleted automatically.

Dashboard[edit]

  • Location: /var/log/archivematica/dashboard
  • Contains: Logging from Django, GUI, AJAX calls, start transfer, proxying to storage service, etc
  • Config: archivematica/src/dashboard/src/settings/common.py
  • Files:
    • dashboard.log: INFO and higher logs
    • dashboard.debug.log: DEBUG logging for above

MCPClient[edit]

  • Location: /var/log/archivematica/MCPClient
  • Contains: Logging from the MCPClient and client scripts
  • Config:
    • archivematica/src/MCPClient/lib/archivematicaClient.py for MCPClient
    • archivematica/src/archivematicaCommon/lib/custom_handlers.py for client_scripts.log
  • Files:
    • MCPClient.log: Logging from MCPClients that listen to gearman and run client scripts
    • MCPClient.debug.log: DEBUG logging for above
    • client_scripts.log: Logging from the client scripts themselves

MCPServer[edit]

  • Location: /var/log/archivematica/MCPServer
  • Contains: Logging from the MCPServer, MicroServiceChainLinks, sending jobs to gearman
  • Config: archivematica/src/MCPServer/lib/archivematicaMCP.py
  • Files:
    • MCPServer.log: INFO and higher logs
    • MCPServer.debug.log: DEBUG logging for above

Storage Service[edit]

  • Location: /var/log/archivematica/storage-service: Logging from the storage service. This might not be installed on the same machine
  • Contains: Logging from the storage service
  • Config: Storage Service archivematica-storage-service/storage_service/storage_service/settings/base.py
  • Files:
    • storage_service.log: INFO and higher logs
    • storage_service_debug.log: DEBUG logging for above

Prior to 1.4[edit]

WARNING This is deprecated. See above for logging in Archivematica 1.4 and later.

Debugging the MCP can be a difficult task. Logs can be large, and are placed in the /tmp/ directory, so they are automatically removed upon reboot.

Parsing Logs[edit]

Here are some commands to help parse logs:

grep "DEBUG type=\"archivematicaMCP\"" -v /tmp/archivematicaMCPServer* -h > /tmp/archivematicaOutput.txt 

Removes the periodic debug message prints.

grep "Traceback (most recent call last):" /tmp/archivematicaOutput.txt  -n
grep -i EXCEPTION /tmp/archivematicaMCPServer-* -n

-n will prepend the line number

sed -n '302092,+50'p /tmp/archivematicaMCPServer-*

prints 50 lines from the file, including line number 302092. This is useful to look at sections of the log that have exceptions, which can be found with the command above.

debugging tools[edit]

In extreme cases, you can setup your dev enviroment, so you log in as the archivematica user, and use eclipse with pyDev in debug mode, to run the MCP.

what clients are connected[edit]

python -c '
import gearman
admin = gearman.admin_client.GearmanAdminClient(host_list=["127.0.0.1"])
for client in admin. get_workers():
    if client["client_id"] != "-": #exclude server task connections
        print client["client_id"], client["ip"]

for stat in admin.get_status():
    if stat["running"] != 0 or stat["queued"] != 0:
        print stat
' 


Waching activity[edit]

tail /tmp/archivematicaMCP* -f
watch mysql -u root MCP --execute "\"SELECT * FROM Tasks WHERE endTime = 0;\""

Turning on printing all sql queries[edit]

sudo nano /usr/lib/archivematica/archivematicaCommon/databaseInterface.py

http://code.google.com/p/archivematica/source/browse/tags/release-0.8-alpha/src/archivematicaCommon/lib/databaseInterface.py
edit lines 34 and 73
"printSQL = False" -> printSQL = True
" print printSQL" -> " print sql"

This will cause archivematica to print ALL of it's queries issues to the database.