MCPServer

From Archivematica
Revision as of 15:49, 23 April 2013 by Joseph (Talk | contribs)

Jump to: navigation, search

Main Page > Development > Development documentation > MCP

Design

This page proposes a new feature and reviews design options

Development

This page describes a feature that's in development

Documentation

This page documents an implemented feature

Contents

Overview

The MCP is the core of the Archivematica system. It controls the various micro-services in the Archivematica system. Configuration and processing information are held in the database. The user monitors and controls the MCP via the dashboard . The MCP maintains a log of all completed work.

The MCP uses the gearman. The MCP Clients are relatively "dumb". They are gearman worker implementations, that inform the gearman server what tasks they can perform, and wait for the server to assign them a task.

The Archivematica system relies on client and server having access to the same directory, to process the commands. On a distributed system, this is done through the shared directory.

Basic configuration can be seen here MCP Basic Configuration <-- deprecated

Server And Database

The MCP has watched directories, which are linked to Job Chains. Each Job Chain is designed to carry out a function. The function is broken down into managable peices, which are called Job Chain Links. Each of these links performs a task. Like previous versions of the MCP, these tasks may be configured to run once, or once for each file in a directory.

One major fundamental change is that the MCP is no longer as linear as it once was. Decision points allow the user to select the next Microservice chain to process, based on what is available at that point. This allows for the creation of alternative, yet similar workflows to co-exist in the Archivematica-MCP system.

Job Chains

Job Chain Links

Decision Point

Regular Job

Task Types

MCP/TaskTypes

mcp Modules

The mcp Modules are configured in the database, with the following schema.

This may be a little out of date. Note, was generated using mysql workbench (sudo apt-get install mysql-workbench).

MCP configuration database schema.png

Client

Clients connect to the specified gearman server and provides a list of modules they support. When the MCP informs the gearman server of a Task that the client supports and the gearman server assigns the job to the client, the client will process the Job, and return the results to the gearman server, which in turn will return them to the MCP.

Client on Windows

There has been some consideration of getting an MCP client to run in the Microsoft Windows environment. This would be advantageous for normalizing in a windows environment. Some testing has been done to this end. See issue 372.

Debugging

Debugging the MCP can be a difficult task. Logs can be large, and are placed in the /tmp/ directory, so they are automatically removed upon reboot.

Parsing Logs

Here are some commands to help parse logs:

grep "DEBUG type=\"archivematicaMCP\"" -v /tmp/archivematicaMCPServer* -h > /tmp/archivematicaOutput.txt 

Removes the periodic debug message prints.

grep "Traceback (most recent call last):" /tmp/archivematicaOutput.txt  -n
grep -i EXCEPTION /tmp/archivematicaMCPServer-* -n

-n will prepend the line number

sed -n '302092,+50'p /tmp/archivematicaMCPServer-*

prints 50 lines from the file, including line number 302092. This is useful to look at sections of the log that have exceptions, which can be found with the command above.

debugging tools

In extreme cases, you can setup your dev enviroment, so you log in as the archivematica user, and use eclipse with pyDev in debug mode, to run the MCP.

what clients are connected

python -c '
import gearman
admin = gearman.admin_client.GearmanAdminClient(host_list=["127.0.0.1"])
for client in admin. get_workers():
    if client["client_id"] != "-": #exclude server task connections
        print client["client_id"], client["ip"]

for stat in admin.get_status():
    if stat["running"] != 0 or stat["queued"] != 0:
        print stat
' 

Waching activity

tail /tmp/archivematicaMCP* -f
watch mysql -u root MCP --execute "\"SELECT * FROM Tasks WHERE endTime = 0;\""

Turning on printing all sql queries

sudo nano /usr/lib/archivematica/archivematicaCommon/databaseInterface.py

http://code.google.com/p/archivematica/source/browse/tags/release-0.8-alpha/src/archivematicaCommon/lib/databaseInterface.py
edit lines 34 and 73
"printSQL = False" -> printSQL = True
" print printSQL" -> " print sql"

This will cause archivematica to print ALL of it's queries issues to the database.

Change Log

0.8

  • Switched to database configuration.
  • Allows for alternative workflows (ie. don't create DIP)
  • Start, MCP server will try to match any existing directories in the watched directories, to a processing directory/SIP.

0.7.1

  • Work was done on microservices to make the system more stable.
  • A config to set the underlying protocol max length was added.

0.7

  • Work was done on microservices to make the system more stable.

0.6.2

  • MCP was released.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox