Improvements/CentOS-RedHat support

From Archivematica
< Improvements(Redirected from Improvements/rpm)
Jump to navigation Jump to search
This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

User Story[edit]

As a systems administrator, I would like to be able to run Archivematica on an rpm based version of linux like CentOS or RedHat.


Status[edit]

Sponsored development began in early February 2016.

The instructions in order to install archivematica on centos are avaliable at Improvements/CentOS-RedHat_support/Installation Note: These instructions will create an installation of a QA version of Archivematica and is not recommended for production use yet.


There is also an ansible playbook capable of getting Archivematica running on CentOS here: https://github.com/artefactual-labs/ansible-role-archivematica-src/tree/dev/centos . This is a work in progress, not ready to be tested fully yet.

Interest[edit]

Please feel free to add your organizations name to this list, if you have an interest in this improvement.

Artefactual would like to see this improvement developed. We are able to do the development work, for a fee. We are also willing to assist others to complete all or part of the work required, in order to reduce the scope down to a level where Artefactual can complete the process as part of our existing commitment to provide new packages with each release.

The National Library of Wales would like to see this developed. We have been building Archivematica on RHEL for a couple of years now, and would like to see an official rpm distribution. We have contacted Artefactual regarding the work.

Analysis:[edit]

The Current Situation[edit]

Currently, Archivematica only works on LTS versions of Ubuntu (14.04 and 12.04). This is not due to any specific limitations in the Archivematica codebase, it is entirely a matter of packaging and bundling the large number of dependencies required in an Archivematica installation.

Archivematica is comprised of a set of 5 separate packages, written primarily in python. Artefactual currently builds .deb packages for Ubuntu of all 5 packages, and makes them available via launchpad (e.g, the most current stable version is at https://launchpad.net/~archivematica/+archive/ubuntu/1.4).

It is not necessary to use those .deb packages to install Archivematica. For example see these instructions for installing Archivematica using Ansible and Vagrant (https://wiki.archivematica.org/Getting_started#Installation). In that example, the 5 Archivematica packages are installed directly from github, as source, not from .deb packages. However, the longer list of external dependencies are still installed from .deb packages.

Some of those .deb packages are available in the standard Ubuntu repositories, like apache and mysql. Others, like jhove, are in the Ubuntu multiverse repository, which is not always enabled by default in Ubuntu, but is easy to enable. Others are either not available from Ubuntu repositories, or the versions available there are too old to work in Archivematica. For these packages, Artefactual has taken on the responsibility for building .deb packages, and hosting them on launchpad. There are several listed here for example https://launchpad.net/~archivematica/+archive/ubuntu/1.4, like bagit, bulk_extractor, ffmpeg and others.

List of Archivematica's Ubuntu Package Dependencies[edit]

This list is based on the packages installed by the Ansible role ( https://github.com/artefactual-labs/ansible-role-archivematica-src )

Note that it is planned to move some package based dependencies to pip ( https://github.com/artefactual/archivematica/pull/398 )

Installed Using Packages[edit]
Storage Service[edit]
   - "python-lxml"
   - "nginx"
   - "unar"
   - "uwsgi"
   - "uwsgi-plugin-python"
   - "python-virtualenv"
   - "python-dev"
   - "libxml2-dev"
   - "libxslt1-dev"
   - "libz-dev"
   - "libffi-dev"
   - "libssl-dev"
Pipeline (dashboard, MCP Server, MCP Client)[edit]
   - "python"
   - "python-pip"
   - "python2.7-elementtree"
   - "python-mimeparse"
   - "python-dateutil"
   - "apache2-mpm-prefork"
   - "libapache2-mod-wsgi"
   - "python-pip"
   - "python-gearman"
   - "python-simplejson"
   - "dbconfig-common"
   - "logapp"
   - "python-pyinotify"
   - "python-gearman"
   - "python-mysqldb"
   - "python-lxml"
   - "uuid"
   - "atool"
   - "bagit" (*)
   - "bulk-extractor" (*)
   - "clamav"
   - "clamav-daemon"
   - "ffmpeg" (*)
   - "fits" (*)
   - "gearman"
   - "imagemagick"
   - "inkscape"
   - "jhove"
   - "libimage-exiftool-perl" (*)
   - "libxml2-utils"
   - "logapp"
   - "md5deep"
   - "mediainfo"
   - "nfs-common"
   - "openjdk-7-jre-headless"
   - "p7zip-full"
   - "pbzip2"
   - "postfix"
   - "python-fido" (*)
   - "python-gearman"
   - "python-lxml"
   - "python-mysqldb"
   - "python-pyicu"
   - "python-unidecode"
   - "readpst"
   - "rsync"
   - "siegfried" (*)
   - "sleuthkit" (*)
   - "tesseract-ocr"
   - "tika" (*)
   - "tree"
   - "ufraw"
   - "unrar-free"
   - "uuid"

Packages with an (*) have been built by Artefactual, as they do not exist in the public Ubuntu repositories, or the versions found in the public repositories were not recent enough for Archivematica to work properly.

RPM Support[edit]

This analysis is based on the use of CentOS/Redhat 7, for a port to CentOS 6, there would be more work than is outlined here.

In order to get Archivematica running on an rpm based linux distribution, a set of 5 rpm packages for the different Archivematica applications would need to be created. RPM's for about 30 different dependencies would also have to be created.

Archivematica Packages:

Archivematica Common Dashboard MCP Server MCP Client Archivematica Storage Service

optional (not all of these are currently being packaged as .debs): MCP rpc Client automation-tools archivematica-devtools

Other Open Source applications required in rpm packages for Archivematica:

  • atool
  • bagit
  • bulk_extractor
  • exiftool (repoforge has this, but we want to package a specific version)
  • ffmpeg
    • libasound2
    • jack
    • openjpeg - 1.5.x is packaged, might be new enough to avoid packaging
    • libraw1394 - we might be able to live without this
    • libvpx
  • fido
  • fits
  • gearman / python-gearman
  • jhove
  • mediainfo
  • nailgun / nailgun-client
  • nfs - should work without extra packages, but requires testing
  • python-elementtree
  • python-mimeparse
  • siegfried
    • go 1.4
  • sleuthkit
    • afflib
    • libbfio
    • libewf (version in repoforge is 3 years older)
  • tika
  • ufraw
  • unar
  • unidecode
  • unrar-free

All of the remaining required dependencies are either included in the stock RHEL/CentOS 7 repositories, or in the third-party Repoforge repository. (TODO: list which repo each dependency is maintained in).

Artefactual already produces Ubuntu packages for most of the packages listed here, and we have some tooling in place to automate some of this process. See: https://github.com/artefactual-labs/am-packbuild for an example set of python scripts used to create Ubuntu packages for the 5 Archivematica packages. (TODO: move ubuntu packaging scripts to github).

Scope[edit]

Artefactual has done some analysis of the effort required, and provided quotes for this work to some institutions, at their request. Feel free to add additional comments or another piece to this section if you have more information about the possible scope of work required.

Proposed Approach[edit]

  1. Check if the Ubuntu package dependencies have equivalent RPMs in CentOS. We estimate that Ubuntu packages provided by standard Ubuntu repositories have a CentOS equivalent in the standard CentOS repos, and that all the Ubuntu packages built by Artefactual are missing in CentOS and require to be built as well
  2. Build missing CentOS RPMs package dependencies
  3. Fix the ansible role in https://github.com/artefactual-labs/ansible-role-archivematica-src to be able to install Archivematica / Storage Service in CentOS
  4. Repeat steps above as required until we obtain a working CentOS installation

Artefactual Estimate[edit]

We have estimated that it would take about 16-20 hours to create an initial set of packages for all of the required applications.

Another 12-16 hours of testing would be required, at a minimum, to confirm that all the packages are working properly, and to do very basic tests of Archivematica on CentOS 7. Further testing would be desirable, any testing that could be done by the community would help improve the outcome of this work.

We would probably need another 8 hours of work for project administration and documentation.

Based on these estimates, it would probably take about 40 hours of work to create the required packages and test and document the CentOS 7 installation procedure.

Artefactual provides development services on either a fixed fee or a time and materials basis. We offer a less expensive rate for time and materials contracts. Our rates are listed here:

https://www.artefactual.com/services/

Alternative approaches[edit]

It's also possible to ship the Archivematica packages and its dependencies as container images. Docker has donated its container format and runtime to the Open Container Initiative. These images are composable and they can run via the Docker or rkt container runtimes or independently via runC. Images are easily distributed via services like Docker Hub or Docker Distribution.

A first attempt at this approach has been published [1]. This is a docker image containing all of the dependencies of the Archivematica MCP Client package. The image is built on top of an ubuntu base image, but it can be deployed on any linux distribution, not just CentOS. Docker images can also be run on a windows host, via virtualbox on windows 8/10, and via Windows Containers on Windows Server 2016.

It is possible to produce rpm packages, that contain docker images, for example: [2].


More on Docker from RedHat: [3]