Administrator manual 0.10

From Archivematica
Revision as of 18:51, 10 September 2013 by Evelyn (talk | contribs) (→‎Users)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Main Page > Documentation > Administrator Manual

This manual covers administrator-specific instructions for Archivematica. It will also provide help for using forms in the Administration tab of the Archivematica dashboard and the administrator capabilities in the Format Policy Registry (FPR), which you will find in the Preservation planning tab of the dashboard.

For end-user instructions, please see the user manual.

Installation[edit]

Upgrading[edit]

  • Currently, Archivematica does not support upgrading from one version to the next. A re-install is required.

Dashboard administration tab[edit]

The Archivematica administration pages, under the Administration tab of the dashboard, allows you to configure application components and manage users.

Transfer source directories[edit]

Archivematica allows you to start transfers using the operating system's file browser or via a web interface. Source files for transfers, however, can't be uploaded using the web interface: they must exist on volumes accessible to the Archivematica server.

When starting a transfer you're required to select one or more directories of files to add to the transfer. To speed up the process of selecting directories, Archivematica allows you to specify "source directories". A source directory is a directory in which files and directories likely to be added to a transfer are present.

To add a source directory, while on the transfer source directories page of the Administration tab in the dashboard, simply click the folder icon to expand the starting directory and navigate the interface until you find a directory you'd like to select as a source directory. Once you've found a suitable source directory, click the "Add" button to the right of the directory and it will be added.

To remove a source directory, simply click the "Remove" button to the right of it in the source directory path list.

AIP storage directories[edit]

AIP storage directories are directories in which completed AIPs are stored. Storage directories can be specified in a manner similar to source directories.

To add a storage directory, while on the storage directories page of the Administration tab in the dashboard, simply click the folder icon to expand the starting directory and navigate the interface until you find a directory you'd like to select as a storage directory. Once you've found a suitable storage directory, click the "Add" button to the right of the directory and it will be added.

To remove a storage directory, simply click the "Remove" button to the right of it in the storage directory path list.

AtoM DIP upload[edit]

Archivematica can upload DIPs directly to an AtoM website so the contents can be accessed online. The AtoM DIP upload configuration page is where you specify the details of the AtoM installation you'd like the DIPs uploaded to (and, if using Rsync to transfer the DIP files, Rsync transfer details).

The parameters that you'll most likely want to set are url, email, and password. These parameters, respectively, specify the destination AtoM website's URL, the email address used to log in to the website, and the password used to log in to the website.

AtoM DIP upload can also use Rsync as a transfer mechanism. Rsync is an open source utility for efficiently transferring files. The rsync-target parameter is used to specify an Rsync-style target host/directory pairing, "foobar.com:~/dips/" for example. The rsync-command parameter is used to specify rsync connection options, "ssh -p 22222 -l user" for example. If you are using the rsync option, please see AtoM server configuration below.

To set any parameters for AtoM DIP upload change the values, preserving the existing format they're specified in, in the "Command arguments" field then click "Save".

Note that in AtoM, the sword plugin (Admin --> qtSwordPlugin) and job scheduling (Admin --> Settings --> Job scheduling) must both be enabled in order for AtoM to receive uploaded DIPs.

AtoM server configuration[edit]

This server configuration step is necessary to allow Archivematica to log in to the AtoM server without passwords, and only when the user is deploying the rsync option described above in the AtoM DIP upload section.

To enable sending DIPs from Archivematica to the AtoM server:

Generate SSH keys for the Archivematica user. Leave the passphrase field blank.

 $ sudo -i -u archivematica
 $ cd ~
 $ ssh-keygen

Copy the contents of /var/lib/archivematica/.ssh/id_rsa.pub somewhere handy, you will need it later.

Now, it's time to configure the AtoM server so Archivematica can send the DIPs using SSH/rsync. For that purpose, you will create a user called archivematica and we are going to assign that user a restricted shell with access only to rsync:

 $ sudo apt-get install rssh
 $ sudo useradd -d /home/archivematica -m -s /usr/bin/rssh archivematica
 $ sudo passswd -l archivematica
 $ sudo vim /etc/rssh.conf // Make sure that allowrsync is uncommented!

Add the SSH key that we generated before:

 $ sudo mkdir /home/archivematica/.ssh
 $ chmod 700 /home/archivematica/.ssh/
 $ sudo vim /home/archivematica/.ssh/authorized_keys // Paste here the contents of id_dsa.pub
 $ chown -R archivematica:archivematica /home/archivematica

In Archivematica, make sure that you update the --rsync-target accordingly.
These are the parameters that we are passing to the upload-qubit microservice.
Go to the Administration > Upload DIP page in the dashboard.

Generic parameters:

--url="http://atom-hostname/index.php" \
--email="demo@example.com" \
--password="demo" \
--uuid="%SIPUUID%" \
--rsync-target="archivematica@atom-hostname:/tmp" \
--debug

CONTENTdm DIP upload[edit]

Archivematica can also upload DIPs to CONTENTdm instances. Multiple CONTENTdm destinations may be configured. For each possible CONTENTdm DIP upload destination, you'll specify a brief description and configuration parameters appropriate for the destination. Paramters include %ContentdmServer% (full path to the CONTENTdm API, including the leading 'http://' or 'https://', for example http://example.com:81/dmwebservices/index.php), %ContentdmUser%, and %ContentdmGroup% (Linux user and group on the CONTENTdm server, not a CONTENTdm username). Note that only %ContentdmServer% is required is you are going to produce CONTENTdm Project Client packages; %ContentdmUser%, and %ContentdmGroup% are also required if you are going to use the "direct upload" option for uploading your DIPs into CONTENTdm.

When changing parameters for a CONTENTdm DIP upload destination simply change the values, preserving the existing format they're specified in. To add an upload destination fill in the form at the bottom of the page with the appropriate values. When you've completed your changes click the "Save" button.

Processing configuration[edit]

When processing a SIP or transfer, you may want to automate some of the workflow choices. Choices can be preconfigured by putting a 'processingMCP.xml' file into the root directory of a SIP/transfer.

If a SIP or transfer is submitted with a 'processingMCP.xml' file, processing decisions will be made with the included file.

The XML file format is:

<processingMCP>
  <preconfiguredChoices>
    <preconfiguredChoice>
      <appliesTo>Workflow decision - create transfer backup</appliesTo>
      <goToChain>Do not backup transfer</goToChain>
    </preconfiguredChoice>
    <preconfiguredChoice>
      <appliesTo>Workflow decision - send transfer to quarantine</appliesTo>
      <goToChain>Skip quarantine</goToChain>
    </preconfiguredChoice>
    <preconfiguredChoice>
      <appliesTo>Remove from quarantine</appliesTo>
      <goToChain>Unquarantine</goToChain>
      <delay unitCtime="yes">50</delay>
    </preconfiguredChoice>
  </preconfiguredChoices>
</processingMCP>

Where appliesTo is the name of the job presented in the dashboard, and goToChain is the desired selection. Note: these are case sensitive. The default processingMCP.xml file is located at '/var/archivematica/sharedDirectory/sharedMicroServiceTasksConfigs/processingMCPConfigs/defaultProcessingMCP.xml'.

The processing configuration administration page of the dashboard provides you with an easy form to configure the default 'processingMCP.xml' that's added to a SIP or transfer if it doesn't already contain one. When you change the options using the web interface the necessary XML will be written behind the scenes.

Processing configuration form in Administration tab of the dashboard
  • For the approval (yes/no) steps, the user ticks the box on the left-hand side to make a choice. If the box is not ticked, the approval step will appear in the dashboard.
  • For the other steps, if no actions are selected the choices appear in the dashboard
  • You can select whether or not to send transfers to quarantine (yes/no) and decide how long you'd like them to stay there.
  • You can approve normalization, sending the AIP to storage, and uploading the DIP without interrupting the workflow in the dashboard.
  • You can pre-select which format identification tool to base your normalization upon.
  • You can choose to send a transfer to backlog or to create a SIP every time.
  • You can select between lzma and bzip algorithms for AIP compression.
  • For select compression level, the options are as follows:
    • 9 - ultra compression
    • 7 - maximum compression
    • 3 - fast compression mode
    • 1 - fastest mode
    • 0 - copy mode
  • You can select one archival storage location where you will consistently send your AIPs.

PREMIS agent[edit]

The PREMIS agent name and code can be set via the administration interface.

thumbs

Rest API[edit]

In addition to automation using the processingMCP.xml file, Archivematica includes a REST API for automating transfer approval. Using this API, you can create a custom script that copies a transfer to the appropriate directory then uses the curl command, or some other means, to let Archivematica know that the copy is complete.

API keys[edit]

Use of the REST API requires the use of API keys. An API key is associated with a specific user. To generate an API key for a user:

  1. Browse to /administration/accounts/list/
  2. Click the "Edit" button for the user you'd like to generate an API key for
  3. Click the "Regenerate API key" checkbox
  4. Click "Save"

After generating an API key, you can click the "Edit" button for the user and you should see the API key.

IP whitelist[edit]

In addition to creating API keys, you'll need to add the IP of any computer making REST requests to the REST API whitelist. The IP whitelist can be edited in the administration interface at /administration/api/.

Approving a transfer[edit]

The REST API can be used to approve a transfer. The transfer must first be copied into the appropriate watch directory. To determine the location of the appropriate watch directory, first figure out where the shared directory is from the sharedDirectory value of /etc/archivematica/MCPServer/serverConfig.conf. Within that directory is a subdirectory activeTransfers. In this subdirectory are watch directories for the various transfer types.

When using the REST API to approve a transfer, if a transfer type isn't specified, the transfer will be deemed a standard transfer.

HTTP Method: POST

URL: /api/transfer/approve

Parameters:

directory: directory name of the transfer

type (optional): transfer type [standard|dspace|unzipped bag|zipped bag]

api_key: an API key

username: the username associated with the API key

Example curl command:

   curl --data "username=rick&api_key=f12d6b323872b3cef0b71be64eddd52f87b851a6&type=standard&directory=MyTransfer" http://127.0.0.1/api/transfer/approve

Example result:

   {"message": "Approval successful."}

Listing unapproved transfers[edit]

The REST API can be used to get a list of unapproved transfers. Each transfer's directory name and type is returned.

Method: GET

URL: /api/transfer/unapproved

Parameters:

api_key: an API key

username: the username associated with the API key

Example curl command:

   curl "http://127.0.0.1/api/transfer/unapproved?username=rick&api_key=f12d6b323872b3cef0b71be64eddd52f87b851a6"

Example result:

   {
       "message": "Fetched unapproved transfers successfully.",
       "results": [{
               "directory": "MyTransfer",
              "type": "standard"
           }
       ]
   }

Users[edit]

The dashboard provides a simple cookie-based user authentication system using the Django authentication framework. Access to the dashboard is limited only to logged-in users and a login page will be shown when the user is not recognized. If the application can't find any user in the database, the user creation page will be shown instead, allowing the creation of an administrator account.

Users can be also created, modified and deleted from the Administration tab. Only users who are administrators can create and edit user accounts.

You can add a new user to the system by clicking the "Add new" button on the user administration page. By adding a user you provide a way to access Archivematica using a username/password combination. Should you need to change a user's username or password, you can do so by clicking the "Edit" button, corresponding to the user, on the administration page. Should you need to revoke a user's access, you can click the corresponding "Delete" button.

CLI creation of administrative users[edit]

If you need an additional administrator user one can be created via the command-line after navigating to the src/dashboard/src directory in the source tree.

   python manage.py createsuperuser --settings='settings.common'

CLI password resetting[edit]

If you've forgotten the password for your administrator user, or any other user, you can change it via the command-line.

   python manage.py changepassword <username> --settings='settings.common'

Security[edit]

Archivematica uses PBKDF2 as the default algorithm to store passwords. This should be sufficient for most users: it's quite secure, requiring massive amounts of computing time to break. However, other algorithms could be used as the following document explains: How Django stores passwords.

Our plan is to extend this functionality in the future adding groups and granular permissions support.

Dashboard preservation planning tab[edit]

Format Policy Registry (FPR)[edit]

  • The Format Policy Registry (FPR) is how Archivematica manages preservation planning using format policies. A format policy indicates the actions, tools and settings to apply to a file of a particular file format (e.g. conversion to preservation format, conversion to access format). Format policies will change as community standards, practices and tools evolve.
  • Hosted at fpr.archivematica.org, the FPR server stores structured information about normalization format policies for preservation and access. These policies identify preferred preservation and access formats. These default format policies can all be changed or enhanced locally by individual Archivematica implementers. The purpose of the FPR Server is to provide a mechanism for organizations to collaborate on the very difficult problem of Preservation Planning for the long tail of diverse digital file formats, that exist now, and will exist in the future. By providing a central online location where different organizations can review, compare, and adopt format policies, Artefactual hopes to make a significant contribution to the productivity and effectiveness of the digital preservation efforts of Archivematica users. For information about how default format policies were selected, see the analysis of significant characteristics and tools here: Format policies
  • Subscription to the FPR will allow the Archivematica project to notify users when new or updated preservation and access plans become available, allowing them to make better decisions about normalization and migration strategies for specific format types within their collections. It will also allow them to trigger migration processes as new tools and knowledge becomes available.
  • New Archivematica installations will attempt to connect to the FPR Server as part of the initial configuration process. The FPR Server is a web application hosted by Artefactual, on a secure server in Toronto. During the last step of the intsallation process, Archivematica attempts to send the following information to the FPR Server, over an encrypted connection:
  1. Agent Identifier (supplied by the user during registration while installing Archivematica)
  2. Agent Name (supplied by the user during registration while installing Archivematica)
  3. IP address of host
  4. UUID of Archivematica instance
  5. current time
  • If Archivematica is not able to connect to the FPR Server during installation, no further attempts are ever made. However, the Preservation Planning tab in the Archivematica dashboard has a 'check for updates' button. This button initiates a connection from the FPR Server, where Archivematica asks for and receives a full set of Format Policy Rules from the FPR Server. No information is sent to the FPR Server during this process. The rules are processed locally by Archivematica, and used to update the local Format Policy Registry. The FPR Server is able to log each https request that is sent to it, but no authentication takes place, and only IP address and current time is logged (not agent name and agent identifier).
  • In the 0.10 release, no information about the local format policy rules is sent to the FPR Server. Other than the initial connection during installation, where the Agent information is uploaded, all communication with the FPR Server is one-way, from the FPR Server to Connection to the FPR Server is voluntary, any organization using Archivematica that does not want to connect to the FPR Server can either:
    • a) make sure the Archivematica machine cannot reach the FPR Server by putting it behind a firewall, or disabling it's internet connection, or
    • b) modify the source code of Archivematica and comment out the FPR connection attempt
  • For any organization not wishing to have their Archivematica installations communicate in any way with the FPR Server, opting out can be as simple as disconnecting the network cable from the host machine while filling out the Agent information screen during installation, and making sure not to click the 'check for updates' button on the preservation planning tab.
  • The only functionality lost in blocking access to the FPR Server is the ability to update the local Format Policy Registry with new Format Policy Rules published by Artefactual. It would still be possible to get those rules when new versions Archivematica are released, but not between releases.
  • In future versions of Archivematica, Artefactual intends to increase the functionality provided by the FPR Server, including allowing organizations to share their format policies. Participation will be entirely optional, and we will provide a method for organizations to share their format policies anonymously. Users will also be able to turn off FPR updates.
  • The only information that will be passed back and forth between Archivematica and the FPR Server would be these format policies - what tool to run when normalizing for a given purpose (access, preservation) when a specific File Identification Tool identifies a specific File Format. No information about the content that has been run through Archivematica, or any details about the Archivematica installation or configuration would be sent to the FPR Server.
  • Because Archivematica is an open source project, it is possible for any organization to conduct a software audit/code review before running Archivematica in a production environment in order to independently verify the information being shared with the FPR Server. An organization could choose to run a private FPR Server, accessible only within their own network(s), to provide at least a limited version of the benefits of sharing format policies, while guaranteeing a completely self-contained preservation system. This is something that Artefactual is not intending to develop, but anyone is free to extend the software as they see fit, or to hire us or other developers to do so.

Customization and automation[edit]

  • Workflow processing decisions can be made in the processingMCP.xml file. See here.
  • Workflows are currently created at the development level.
    Some resources avialable
  • Normalization commands can be viewed in the preservation planning tab.
  • Normalization paths and commands are currently editable under the preservation planning tab in the dashboard.

Elasticsearch[edit]

Archivematica has the capability of indexing data about files contained in AIPs and this data can be accessed programatically for various applications.

If, for whatever reason, you need to delete an ElasticSearch index please see ElasticSearch Administration.

If, for whatever reason, you need to delete an Elasticsearch index programmatically, this can be done with pyes using the following code.

import sys
sys.path.append("/home/demo/archivematica/src/archivematicaCommon/lib/externals")
from pyes import *
conn = ES('127.0.0.1:9200')

try:
    conn.delete_index('aips')
except:
    print "Error deleting index or index already deleted."

Rebuilding the AIP index[edit]

To rebuild the ElasticSearch AIP index enter the following to find the location of the rebuilding script:

   locate rebuild-elasticsearch-aip-index-from-files

Copy the location of the script then enter the following to perform the rebuild (substituting "/your/script/location/rebuild-elasticsearch-aip-index-from-files" with the location of the script):

   /your/script/location/rebuild-elasticsearch-aip-index-from-files <location of your AIP store>

Data backup[edit]

In Archivematica there are three types of data you'll likely want to back up:

  • Filesystem (particularly your storage directories)
  • MySQL
  • ElasticSearch

MySQL is used to store short-term processing data. You can back up the MySQL database by using the following command:

mysqldump -u <your username> -p<your password> -c MCP > <filename of backup>

ElasticSearch is used to store long-term data. Instructions and scripts for backing up and restoring ElasticSearch are available here.

Security[edit]

Once you've set up Archivematica it's a good practice, for the sake of security, to change the default passwords.

MySQL[edit]

You should create a new MySQL user or change the password of the default "archivematica" MySQL user. The change the password of the default user, enter the following into the command-line:

$ mysql -u root -p<your MyQL root password> -D mysql \
   -e "SET PASSWORD FOR 'archivematica'@'localhost' = PASSWORD('<new password>'); \
   FLUSH PRIVILEGES;"

Once you've done this you can change Archivematica's MySQL database access credentials by editing these two files:

  • /etc/archivematica/archivematicaCommon/dbsettings (change the user and password settings)
  • /usr/share/archivematica/dashboard/settings/common.py (change the USER and PASSWORD settings in the DATABASES section)

Archivematica does not presently support secured MySQL communication so MySQL should be run locally or on a secure, isolated network. See issue 1645.

AtoM[edit]

In addition to changing the MySQL credentials, if you've also installed AtoM you'll want to set the password for it as well. Note that after changing your AtoM credentials you should update the credentials on the AtoM DIP upload administration page as well.

Gearman[edit]

Archivematica relies on the German server for queuing work that needs to be done. Gearman currently doesn't support secured connections so Gearman should be run locally or on a secure, isolated network. See issue 1345.

Questions[edit]

If you run into any difficulties while administrating Archivematica, please check out our FAQ and, if that doesn't help you, contain us using the Archivematica discussion group.

Frequently asked questions[edit]

Discussion group[edit]