Difference between revisions of "MCPServer"
m (→Database Reference: add links to tasktype) |
m (→MicroServiceChains: link to tasktype) |
||
Line 75: | Line 75: | ||
* '''Referenced by''': MicroServiceChainChoice, WatchedDirectories, | * '''Referenced by''': MicroServiceChainChoice, WatchedDirectories, | ||
* startingLink is foreign key to MicroServiceChainLinks | * startingLink is foreign key to MicroServiceChainLinks | ||
+ | * configuration for '[[/TaskTypes#Get User Choice - select chain|get user choice to proceed with]]' | ||
=== MicroServiceChainLinks === | === MicroServiceChainLinks === |
Revision as of 12:39, 16 October 2013
Main Page > Development > Development documentation > MCP
Design
This page proposes a new feature and reviews design options
Development
This page describes a feature that's in development
Documentation
This page documents an implemented feature
The MCP is the core of the Archivematica system. It controls the various micro-services in the Archivematica system. Configuration and processing information are held in the database. The user monitors and controls the MCP via the dashboard . The MCP maintains a log of all completed work.
Microservices are run in chains. On startup, Archivematica checks all the watched directories it is aware of. For each file in those directories, it creates a Job Chain and an associated unit (SIP, DIP or Transfer). The Job Chain runs until completion, which usually puts the Transfer/SIP in another directory, to start another chain.
The MCP uses the gearman. The MCP Clients are relatively "dumb". They are gearman worker implementations, that inform the gearman server what tasks they can perform, and wait for the server to assign them a task.
The Archivematica system relies on client and server having access to the same directory, to process the commands. On a distributed system, this is done through the shared directory.
All files for the MCPServer are found in src/MCPServer/lib/, and all files for the MCPClient are found in src/MCPClient/lib/.
MCP Server
The MCP has watched directories, which are linked to Job Chains. Each Job Chain is designed to carry out a function. The function is broken down into managable pieces, which are called Job Chain Links. Each of these links performs a task.
Startup
- archivematicaMCP.py looks at all watched directories in WatchedDirectories
- table WatchedDirectories, file src/MCPServer/lib/archivematicaMCP.py
- for each file in each directory, create unit (SIP, DIP, Transfer) and job chain
- functions watchDirectories, createUnitAndJobChainThreaded, createUnitAndJobChain
- See below for how a job chain works
Tasks Workflow/Job Chains
- jobChain.py looks up its first link in MicroServiceChains.startingLink and starts it
- jobChainLink.py find its task in MicroServiceChainLinks.currentTask
- jobChainLink.py continues and looks up what type of task it is in TasksConfigs.taskType, instantiates a LinkTaskManager*.py class based on that and passes class-specific info from TasksConfigs.taskTypePKReference
- Most LinkTaskManager*.py looks up configuration for task in StandardTasksConfigs using TasksConfigs.taskTypePKReference, replaces all the "%FOO%" parameters, creates the actual task and hands it to Gearman to run
- Unit (unitTransfer.py, unitDIP.py, unitTransfer.py and unitFile.py) know stuff about the unit, including all possible "%FOO%" parameters that are valid for that unit, and what to replace them with
- Other LinkTaskManager*.py use TasksConfigs.taskTypePKReference as a foreign key to another table, do not use it at all. See #TaskTypes section and #Database Reference for more information
- Gearman runs the tasks, and collects the exit code, stdout and stderr, and calls back to LinkTaskManager*.py.taskCompletedCallBackFunction, which calls jobChainLink.py.linkProcessingComplete
- jobChainLink.py looks in MicroServiceChainLinksExitCodes for the pairing of MicroServiceChainLink and exitCode, to find nextMicroServiceChainLink
- If nothing is found, use MicroServiceChainLinks.defaultNextChainLink
- jobChain.py gets the UUID of the next MicroServiceChainLinks, and starts the next jobChainLink (see above)
Task Types
Client
When a client starts, it connects to the specified gearman server and provides a list of modules they support. When the MCP informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCP.
Database Reference
Database Schema Diagram
The MCP Modules are configured in the database, with the following schema. (Generated using mysql workbench sudo apt-get install mysql-workbench)
MicroServiceChains
- Intent: Entry point into chains
- Knows: Description to display (if the user has to choose it), starting chain link
- Referenced by: MicroServiceChainChoice, WatchedDirectories,
- startingLink is foreign key to MicroServiceChainLinks
- configuration for 'get user choice to proceed with'
MicroServiceChainLinks
- Intent: The task, when/how to do it, and where to go next. Often referenced
- Knows: currentTask, default/failed next link
- Referenced by: Jobs, MicroServiceChainChoice, MicroServiceChainLinks (itself), MicroServiceChainLinksExitCodes, MicroServiceChains, MicroServiceChoiceReplacementDic, SIPs, Transfers
- currentTask is foreign key to TasksConfigs
TasksConfigs
- aka "the weird table"
- Intent: Starting point to find configs for a link
- Knows: taskType, taskTypePKReference (aka class specific information), Job description presented to user
- Referenced by: MicroServiceChainLinks
- taskType is a foreign key to TaskTypes
- taskTypePKReference is semantically a foreign key to another table (determined by taskType) (often StandardTasksConfigs)
- taskType determines what table to look at, taskTypePKReference is semantically foreign key to a row in the specificed table
TaskTypes
- Intent: Define all the ways tasks can be run/generic what they do, eg. run once, run once for each file, give user choices etc
- Knows: N/A
- Referenced by: TasksConfigs
- Each TaskType is associated with a linkTaskManager* class
- More detail
MicroServiceChainLinksExitCodes
- Intent: Show which ChainLink to execute next
- Knows: chainLink associated with it, an exit code, where to go with that
- Referenced by: None
- microServiceChainLink is a foreign key to MicroServiceChainLinks
- nextMicroServiceChainLink is also a foreign key to MicroServiceChainLinks
- Usually there's only one exit code defined for a MicroServiceChainLinks (usually completed successfully), but can also be used to branch workflow depending on result (exit code) of a task.
StandardTasksConfigs
- Intent: Configurations for common tasks
- Knows: arguments, what to execute
- Referenced by: None
- configuration for
- 'one instance' (LinkTaskManagerDirectories.py)
- 'for each file', (LinkTaskManagerFiles.py)
- 'Get microservice generated list in stdOut' (linkTaskManagerGetMicroserviceGeneratedListInStdOut.py)
- 'Get user choice from microservice generated list' (linkTaskManagerGetUserChoiceFromMicroserviceGeneratedList.py)
- 'Split Job into many links based on file ID' (linkTaskManagerSplitOnFileIdAndruleset)
WatchedDirectories
- Intent: Knows what directories to watch, and what to do when something happens
- Knows: watched path, what to do, expected type=unit (eg. SIP, DIP, transfer)
- Referenced by: None
- chain is a foreign key to MicroServiceChains
- expectedType is a foreign key to WatchedDirectoriesExpectedTypes
MicroServiceChainChoice
- Intent: Configuration for taskType 'get user choice to proceed with', selection of a chain to process
- Knows: Chain Link that the choice comes from, chain to go to if selected
- Referenced by: None
- choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
- chainAvailable is a foreign key to MicroServiceChains
- configuration for 'get user choice to proceed with' (linkTaskManagerChoice.py)
MicroServiceChoiceReplacementDic
- Intent: Configuration for taskType 'get replacement dic from user choice'
- Knows: Chain Link that the choice comes from, text to display, and a dict of {replacement_string : code_thing }
- Referenced by: None
- choiceAvailableAtLink is a foreign key to MicroServiceChainLinks
- configuration for 'get replacement dic from user choice' (linkTaskManagerReplacementDicFromChoice.py)
FileIDsBySingleID
- Intent: Map a tool and its output to a FileID (file format)
- Knows: Tool, tool version, tool output, what FileID that corresponds to
- Referenced By: None
- fileID is a foreign key to FileIDs
FileIDs
- Intent: Describe a file format
- Knows: valid for preservation or access, description, how the file was identified
- Referenced By: CommandRelationships, FileIDsBySingleID, FilesIdentifiedIDs
- fileIDType is a foreign key to FileIDTypes
FileIDTypes
- Intent: Stores a list of all the ways/tools a file can be identified by
- Knows: All the ways a file can be identified
- Referenced By: FileIDs
- Small, ~13 entries
FilesIdentifiedIDs
- Intent: Mapping between File and File ID
- Knows: File, FileID
- Referenced By: None
- fileUUID is a foreign key to Files
- fileID is a foreign key to FileIDs
Files
- Intent: Information about a file
- Knows: UUID, original and current location, SIP or Transfer UUID, etc
- Referenced By: FilesIDs, FilesIdentifiedIDs
CommandRelationships
- aka "Format Policy Rule"
- Intent: Map between file ID, command classification and command
- Knows: fileID, commandClassification, Command, statistics on success/failure
- Referenced By: None
- Effectively has a three part primary key - fileID, commandClassification and command
- fileID is a foreign key to FileIDs
- commandClassification is a foreign key to CommandClassifications
- command is a foreign key to Commands
Commands
- Intent: Information about the command
- Knows: command itself, verification and event detail commands, output location and format
- Referenced By:
- commandType is a foreign key to CommandTypes
- eventDetailCommand and verificationCommand are foreign keys to Commands (itself)
CommandTypes
- Intent: Type of command being run
- Knows: All possible types of commands
- Referenced By: CommandRelationships
- Small, ~3 entries (bashScript, pythonScript, command [line])
CommandClassifications
- Intent: Classification (access, preservation etc) of the command being run
- Knows: All possible classifications
- Referenced By:
- Small, ~4 entries - thumbnail, access, preservation, extraction
- access: 3141bc6f-7f77-4809-9244-116b235e7330
- preservation: 3d1b570f-f500-4b3c-bbbc-4c58aad05c27
- thumbnail: 27c2969b-b6a0-441d-888d-85292b692064
DefaultCommandsForClassifications
- Intent: Default action for a given classification
- Knows: command classification, chain link to run
- Referenced By:
- forClassification is a foreign key to CommandClassifications
- MicroserviceChainLink is a foreign key to MicroserviceChainLinks
Debugging
See Debugging tips Debugging the MCP can be a difficult task. Logs can be large, and are placed in the /tmp/ directory, so they are automatically removed upon reboot.
Parsing Logs
Here are some commands to help parse logs:
grep "DEBUG type=\"archivematicaMCP\"" -v /tmp/archivematicaMCPServer* -h > /tmp/archivematicaOutput.txt
Removes the periodic debug message prints.
grep "Traceback (most recent call last):" /tmp/archivematicaOutput.txt -n
grep -i EXCEPTION /tmp/archivematicaMCPServer-* -n
-n will prepend the line number
sed -n '302092,+50'p /tmp/archivematicaMCPServer-*
prints 50 lines from the file, including line number 302092. This is useful to look at sections of the log that have exceptions, which can be found with the command above.
debugging tools
In extreme cases, you can setup your dev enviroment, so you log in as the archivematica user, and use eclipse with pyDev in debug mode, to run the MCP.
what clients are connected
python -c ' import gearman admin = gearman.admin_client.GearmanAdminClient(host_list=["127.0.0.1"]) for client in admin. get_workers(): if client["client_id"] != "-": #exclude server task connections print client["client_id"], client["ip"] for stat in admin.get_status(): if stat["running"] != 0 or stat["queued"] != 0: print stat '
Waching activity
tail /tmp/archivematicaMCP* -f
watch mysql -u root MCP --execute "\"SELECT * FROM Tasks WHERE endTime = 0;\""
Turning on printing all sql queries
sudo nano /usr/lib/archivematica/archivematicaCommon/databaseInterface.py
- http://code.google.com/p/archivematica/source/browse/tags/release-0.8-alpha/src/archivematicaCommon/lib/databaseInterface.py
- edit lines 34 and 73
- "printSQL = False" -> printSQL = True
- " print printSQL" -> " print sql"
This will cause archivematica to print ALL of it's queries issues to the database.
Change Log
0.8
- Switched to database configuration.
- Allows for alternative workflows (ie. don't create DIP)
- Start, MCP server will try to match any existing directories in the watched directories, to a processing directory/SIP.
0.7.1
- Work was done on microservices to make the system more stable.
- A config to set the underlying protocol max length was added.
0.7
- Work was done on microservices to make the system more stable.
0.6.2
- MCP was released.