Micro-services

From Archivematica
Revision as of 15:35, 7 February 2012 by Courtney (talk | contribs)
Jump to navigation Jump to search

Main Page > Documentation > Technical Architecture > Micro-services


Micro-service.png

The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, XML metadata, etc..

These information packages are moved from one service to the next using the well-established Unix pipeline design pattern. Each micro-service is defined in a simple XML configuration file and associated with a watched directory. When an information package is moved to that directory it triggers the micro-service.

Each service is provided by a combination of Archivematica Python scrips and one or more of the free, open-source software tools bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is moved accordingly to a success or error directory. Each success or error directory is the watched directory for a subsequent micro-service. This allows for the chaining of directories into complex, custom workflows. Archivematica implements a default ingest to access workflow that is compliant with the ISO-OAIS functional model.

Archivematica 0.8 Micro-services Archivematica 0.7.1 Micro-services

Micro-service Description
Create SIP backup Creates a backup of the SIP. By default these are stored in /sharedDirectoryStructure/SIPbackups/. The backups are automatically removed at the end of SIP processing, when the AIP has been moved to archival storage.
Verify SIP compliance Verify that the SIP conforms to the folder structure required for processing in Archivematica. The structure is as follows: /logs/, /logs/fileMeta/, /metadata/, /metadata/submissionDocumentation/, /objects/.
Assign file UUIDs and checksums Assigns file UUIDs and generates checksums for each file in the /objects/ directory. This step also creates the PREMIS files located in the /logs/fileMeta/ directory. The files in this directory are named based on the fileUUID of the file they represent.
Verify metadata directory checksums Checks any checksum files that were placed in the /metadata/ folder of the SIP prior to ingest. Note that the filenames need to be named based on their algorithm: checksum.sha1, checksum.sha256, checksum.md5.
Remove thumbs.db files Removes any Thumbs.db files. May be expanded to others in future releases.
Create Dublin Core template If the ingested SIP does not already contain one, a Dublin Core xml template is added to the /metadata/ folder in the SIP. The user can fill in fields as desired. These values are uploaded to the access system as part of the DIP created by Archivematica.
Set file permissions Changes file permissions on the SIP to allow the user to modify the SIP contents.
Appraise SIP for submission Manual approval step. Review the SIP to confirm that it conforms to any submission agreements and remove files and folders if desired. Do not move or rename files or folders as this will cause them to be excluded from the AIP.
Scan for removed files post appraise SIP for submission Checks to see if any files were deleted and creates a list of them at /logs/removedFilesAppraiseSIPForSubmission.log.
Place in quarantine Places SIP in quarantine for a pre-set period of time. The purpose of this is to allow time for new viruses to be identified, and antivirus groups to update their virus definitions. Note: for demonstration purposes, the quarantine period is set to a minute.
Remove from quarantine Archivematica uses a cron job to periodically check for SIPs that have met the configured quarantine time. Keeping in mind the purpose of the quarantine period, if you know the virus definitions are up to date for any virus possibly contained in the SIP (eg. The SIP source is a cd from 4 years ago) then you can remove it from quarantine manually.
Extract packages Extracts objects from any zipped files or other packages.
Sanitize file and directory names Some file systems do not support unicode or other special characters in filenames. This micro-service removes prohibited characters and replaces them with dashes. Original filenames are preserved in the PREMIS metadata.
Scan for viruses Uses ClamAV, parses the output and creates a PREMIS event. If a virus is found, the SIP is automatically placed in /sharedPath/watchedDirectories/failed/.
Characterize and extract metadata Identifies and validates formats and extracts object metadata using the File Information Tool Set (FITS). Adds output to the PREMIS files.
Set file permissions Changes file permissions on the SIP to allow the user to modify the SIP contents.
Appraise SIP for preservation Manual approval step. If desired, appraise SIP contents for preservation and delete any unwanted files and folders. Do not move or rename files or folders as this will cause them to be excluded from the AIP. Note: in future releases, appraisal decisions at this point will be assisted by a summary of technical information about the files (format, validation status, characteristics such as compression etc.).
Scan for removed files post appraise SIP for preservation Checks to see if any files were deleted and creates a list of them at /logs/removedFilesAppraiseSIPForPreservation.log.
Create DIP directory Creates a directory for access copies of the ingested files.
Normalize Creates preservation copies and access copies of the ingested files based on rules in the transcoder database. These rules can be seen under the Preservation planning tab in the Archivematica dashboard. Adds access copies to the DIP directory.
Set file permissions Changes file permissions on the SIP to allow the user to modify the SIP contents.
Approve normalization Manual approval step. This micro-service allows for manual normalization. See 0.7.1_How-To#Manual_Normalization. Note that the user cannot delete normalized files: see issue 678.
Check for submission documentation Checks for files in metadata/submissionDocumentation/; if folder is empty, creates a log file indicating that no submission documentation was included in the SIP.
Move Submission Documentation into objects directory Moves the /submissionDocumentation/ directory from /metadata/ to /objects/ for ingest processing.
Assign file UUIDs and checksums to submission documentation See Assign file UUIDs and checksums, above.
Extract packages in submission documentation See Extract packages, above.
Sanitize file and directory names in submission documentation See Sanitize file and directory names, above.
Scan for viruses in submission documentation See Scan for viruses, above.
Characterize and extract metadata in submission documentation See Characterize and extract metadata, above.
Normalize submission documentation See Normalize, above. No access copies are made since the submission documentation is not included in the DIP.
Remove files without PREMIS Removes any files in the /objects/ directory that don't have PREMIS entries. This is done because some normalizations that fail will still leave behind artifacts/files (for example, 0 byte files) that don't belong in the AIP.
Verify PREMIS checksums Verifies the checksums assigned at ingest to ensure that the files have not been modified while being processed by Archivematica.
Compile METS Creates a METS.xml file using the PREMIS files in /logs/fileMeta/.
Add Dublin Core to METS Adds the /metadata/dublincore.xml file to the METS.xml file.
Copy METS to DIP directory Creates a copy of the METS.xml file in the DIP directory.
Generate DIP Moves the DIP directory out of the SIP and into another Archivematica watched directory.
Set file permissions Changes file permissions on the DIP to allow the user to modify the DIP contents.
Prepare AIP Packages the SIP into an AIP using BagIt
Upload DIP Uploads the DIP to the access system (ICA-AtoM). Removes the UUID from the filename as the upload title, but original filename still contains this UUID so it can be traced back to the file in the AIP.
Store AIP Moves the AIP to a specified directory. In the demo version of Archivematica the directory is /sharedDirectoryStructure/www/AIPsStore/. In other environments it can be a remote network mounted directory. The directory structure of the AIP store contains UUID quad directories and an index.html file listing the AIPs in storage. The index.html file is displayed in the dashboard in the Archival storage tab.

Once the AIP has been stored, a copy of the AIP is extracted from storage to a local temp directory, and is validated with the various BagIt checks: verifyvalid, checkpayloadoxum, verifycomplete, verifypayloadmanifests, verifytagmanifests.



Archivematica 0.7.1 micro-services: additional micro-services for ingested bags

For the 0.7.1 release we are alpha testing the ingest of bags compliant with the BagIt specification. If a bag is dropped into /receiveBAG/ the following micro-services are immediately run:


Micro-service Description
Verify BAG Runs the various bagit checks: verifyvalid, checkpayloadoxum, verifycomplete , verifypayloadmanifests, verifytagmanifests
Restructure BAG Restructures bag into the Archivematica-compliant SIP format: the structure is as follows: /logs/, /logs/fileMeta/, /metadata/, /metadata/submissionDocumentation/, /objects/.