Improvements/CentOS-RedHat support

From Archivematica
Jump to navigation Jump to search

User Story

As a systems administrator, I would like to be able to run Archivematica on an rpm based version of linux like CentOS or RedHat.


Status

Discussion/Analysis. No code has yet been written to support this improvement.

Interest

Please feel free to add your organizations name to this list, if you have an interest in this improvement.

Artefactual would like to see this improvement developed. We are able to do the development work, for a fee. We are also willing to assist others to complete all or part of the work required, in order to reduce the scope down to a level where Artefactual can complete the process as part of our existing commitment to provide new packages with each release.

Analysis:

The Current Situation

Currently, Archivematica only works on LTS versions of Ubuntu (14.04 and 12.04). This is not due to any specific limitations in the Archivematica codebase, it is entirely a matter of packaging and bundling the large number of dependencies required in an Archivematica installation.

Archivematica is comprised of a set of 5 separate packages, written primarily in python. Artefactual currently builds .deb packages for Ubuntu of all 5 packages, and makes them available via launchpad (e.g, the most current stable version is at https://launchpad.net/~archivematica/+archive/ubuntu/1.4).

It is not necessary to use those .deb packages to install Archivematica. For example see these instructions for installing Archivematica using Ansible and Vagrant (https://wiki.archivematica.org/Getting_started#Installation). In that example, the 5 Archivematica packages are installed directly from github, as source, not from .deb packages. However, the longer list of external dependencies are still installed from .deb packages.

Some of those .deb packages are available in the standard Ubuntu repositories, like apache and mysql. Others, like jhove, are in the Ubuntu multiverse repository, which is not always enabled by default in Ubuntu, but is easy to enable. Others are either not available from Ubuntu repositories, or the versions available there are too old to work in Archivematica. For these packages, Artefactual has taken on the responsibility for building .deb packages, and hosting them on launchpad. There are several listed here for example https://launchpad.net/~archivematica/+archive/ubuntu/1.4, like bagit, bulk_extractor, ffmpeg and others.

List of Archivematica Dependencies (in Ubuntu)

Installed Using Packages
Storage Service
   - "python-lxml"
   - "nginx"
   - "unar"
   - "uwsgi"
   - "uwsgi-plugin-python"
   - "python-virtualenv"
   - "python-dev"
   - "libxml2-dev"
   - "libxslt1-dev"
   - "libz-dev"
   - "libffi-dev"
   - "libssl-dev"
Pipeline (dashboard, MCP Server, MCP Client)
   - "python"
   - "python-pip"
   - "python2.7-elementtree"
   - "python-mimeparse"
   - "python-dateutil"
   - "apache2-mpm-prefork"
   - "libapache2-mod-wsgi"
   - "python-pip"
   - "python-gearman"
   - "python-simplejson"
   - "dbconfig-common"
   - "logapp"
   - "python-pyinotify"
   - "python-gearman"
   - "python-mysqldb"
   - "python-lxml"
   - "uuid"
   - "atool"
   - "bagit"
   - "bulk-extractor"
   - "clamav"
   - "clamav-daemon"
   - "ffmpeg"
   - "fits"
   - "gearman"
   - "imagemagick"
   - "inkscape"
   - "jhove"
   - "libimage-exiftool-perl"
   - "libxml2-utils"
   - "logapp"
   - "md5deep"
   - "mediainfo"
   - "nfs-common"
   - "openjdk-7-jre-headless"
   - "p7zip-full"
   - "pbzip2"
   - "postfix"
   - "python-fido"
   - "python-gearman"
   - "python-lxml"
   - "python-mysqldb"
   - "python-pyicu"
   - "python-unidecode"
   - "readpst"
   - "rsync"
   - "siegfried"
   - "sleuthkit"
   - "tesseract-ocr"
   - "tika"
   - "tree"
   - "ufraw"
   - "unrar-free"
   - "uuid"

Installed Using Python pip =

Storage Service

(TODO)

Pipeline (dashboard, MCP Server, MCP Client)

(TODO)

RPM Support

This analysis is based on the use of CentOS/Redhat 7, for a port to CentOS 6, there would be more work than is outlined here.

In order to get Archivematica running on an rpm based linux distribution, a set of 5 rpm packages for the different Archivematica applications would need to be created. RPM's for about 30 different dependencies would also have to be created.

Archivematica Packages:

Archivematica Common Dashboard MCP Server MCP Client Archivematica Storage Service

optional (not all of these are currently being packaged as .debs): MCP rpc Client automation-tools archivematica-devtools

Other Open Source applications required in rpm packages for Archivematica:

  • atool
  • bagit
  • bulk_extractor
  • exiftool (repoforge has this, but we want to package a specific version)
  • ffmpeg
    • libasound2
    • jack
    • openjpeg - 1.5.x is packaged, might be new enough to avoid packaging
    • libraw1394 - we might be able to live without this
    • libvpx
  • fido
  • fits
  • gearman / python-gearman
  • jhove
  • mediainfo
  • nailgun / nailgun-client
  • nfs - should work without extra packages, but requires testing
  • python-elementtree
  • python-mimeparse
  • siegfried
    • go 1.4
  • sleuthkit
    • afflib
    • libbfio
    • libewf (version in repoforge is 3 years older)
  • tika
  • ufraw
  • unar
  • unidecode
  • unrar-free

All of the remaining required dependencies are either included in the stock RHEL/CentOS 7 repositories, or in the third-party Repoforge repository. (TODO: list which repo each dependency is maintained in).

Artefactual already produces Ubuntu packages for most of the packages listed here, and we have some tooling in place to automate some of this process. See: https://github.com/artefactual-labs/am-packbuild for an example set of python scripts used to create Ubuntu packages for the 5 Archivematica packages. (TODO: move ubuntu packaging scripts to github).

Scope

Artefactual has done some analysis of the effort required, and provided quotes for this work to some institutions, at their request. Feel free to add additional comments or another piece to this section if you have more information about the possible scope of work required.

Artefactual Estimate

We have estimated that it would take about 16-20 hours to create an initial set of packages for all of the required applications.

Another 12-16 hours of testing would be required, at a minimum, to confirm that all the packages are working properly, and to do very basic tests of Archivematica on CentOS 7. Further testing would be desirable, any testing that could be done by the community would help improve the outcome of this work.

We would probably need another 8 hours of work for project administration and documentation.

Based on these estimates, it would probably take about 40 hours of work to create the required packages and test and document the CentOS 7 installation procedure.

Artefactual provides development services on either a fixed fee or a time and materials basis. We offer a less expensive rate for time and materials contracts. Our rates are listed here:

https://www.artefactual.com/services/

Alternative approaches

It's also possible to ship the Archivematica packages and its dependencies as container images. Docker has donated its container format and runtime to the Open Container Initiative. These images are composable and they can run via the Docker or rkt container runtimes or independently via runC. Images are easily distributed via services like Docker Hub or Docker Distribution.

A first attempt at this approach has been published [1]. This is a docker image containing all of the dependencies of the Archivematica MCP Client package. The image is built on top of an ubuntu base image, but it can be deployed on any linux distribution, not just CentOS. Docker images can also be run on a windows host, via virtualbox on windows 8/10, and via Windows Containers on Windows Server 2016.

It is possible to produce rpm packages, that contain docker images, for example: [2].


More on Docker from RedHat: [3]