Difference between revisions of "Micro-services"

From Archivematica
Jump to navigation Jump to search
Line 33: Line 33:
 
|If the ingested SIP does not already contain one, a Dublin Core xml template is added to the metadata folder in the SIP. The user can fill in fields as desired. These values are uploaded to the access system as part of the DIP processed by Archivematica.
 
|If the ingested SIP does not already contain one, a Dublin Core xml template is added to the metadata folder in the SIP. The user can fill in fields as desired. These values are uploaded to the access system as part of the DIP processed by Archivematica.
 
|-
 
|-
|Appraise SIP for submission
+
|appraiseForSubmission
|The archivist reviews the SIP, if desired, to confirm that it complies with any submission agreements. The archivist can delete unwanted files at this point; Archivematica will keep a log of the deleted files.
+
|The user may review the SIP to confirm that it complies with any submission agreements. The archivist can delete unwanted files at this point; Archivematica will keep a log of the deleted files.
 
|-
 
|-
|Quarantine
+
|quarantine
|The SIP is placed in quarantine for a pre-set period of time. The archivist can move the SIP out of quarantine before the pre-set time has expired, if desired.
+
|The SIP is placed in quarantine for a pre-set period of time. The user can move the SIP out of quarantine before the pre-set time has expired, if desired.
 
|-
 
|-
|Extract packages
+
|extractPackages
|Files are extracted from any .zip files or other packages; each extracted file is assigned a universal unique identifier and a sha-1 checksum.
+
|Files are extracted from any .zip, .tar or other file package formats; each extracted file is assigned a universal unique identifier and a sha-1 checksum.
 
|-
 
|-
|Sanitize file and directory names
+
|sanitizeNames
|Prohibited characters, such as spaces or ampersands, are removed from file and folder names and replaced with underscores.
+
|Prohibited characters which may cause processing errors on known operating systems (e.g. spaces or ampersands) are removed from file and directory names and replaced with underscores.
 
|-
 
|-
|Scan for viruses
+
|virusScan
|ClamAV scans all files. In the event that a virus or other malware is found, the SIP is placed in a folder called SIPerrors and all processing on the SIP is stopped.
+
|ClamAV scans all files in the SIP. In the event that a virus or other malware is found, the SIP is placed in a folder called SIPerrors and all processing on the SIP is stopped.
 
|-
 
|-
|Characterize and extract metadata
+
|validateFormatsAndExtractMetadata
 
|File formats are identified and the files validated against external format specifications. Technical metadata is extracted from the file.
 
|File formats are identified and the files validated against external format specifications. Technical metadata is extracted from the file.
 
|-
 
|-
|Appraise SIP for preservation
+
|appraiseForPreservation
|The archivist appraises the contents of the SIP, if desired, and deletes unwanted files. Archivematica will keep a log of the deleted files. In future releases of Archivematica, appraisal will be assisted by summary technical information on file formats, validation status and the presence of characteristics that might affect preservation.
+
|The user may appraise the contents of the SIP and delete unwanted files. Archivematica will keep a log of the deleted files.
 
|-
 
|-
|Normalize
+
|transcode
|Archivematica creates a preservation copy and an access copy of each file. For more on normalization, see [[Media type preservation plans]].
+
|Transcode SIP files into a preservation format copy and an access format copy for each file according to its [[Media type preservation plans|media type preservation plan]]. These are packaged along with the original file in the AIP.
 
|-
 
|-
|Compile METS file
+
|compilePreservationMetadata
|Archivematica compiles a METS file with a complete set of PREMIS metadata for each ingested file. The technical metadata that were extracted during the "Characterize and extract metadata" micro-service are placed in the PREMIS objectCharacteristicsExtension element.
+
|Compile a METS file with a complete set of PREMIS metadata for each ingested file. The technical metadata that were extracted during the "Characterize and extract metadata" micro-service are placed in the PREMIS objectCharacteristicsExtension element.
 
|-
 
|-
|Create AIP checksum
+
|createAIPchecksum
|A checksum for all the contents of the AIP is generated.
+
|Generate a checksum for all AIP contents.
 
|-
 
|-
|Prepare AIP
+
|prepareAIP
|The AIP is packaged using the Library of Congress Bagit specification.
+
|Package AIP using the Library of Congress Bagit specification.
 
|-
 
|-
|Store AIP
+
|storeAIP
|The archivist reviews the AIP if desired, and approves it for archival storage. The AIP is moved into the AIPsStore folder, which is linked to the institution's storage system.
+
|The user may review the AIP and approve it for archival storage. The AIP is moved into the AIPsStore folder which is synced to the storage system.
 
|-
 
|-
|Generate DIP
+
|generateDIP
|The access copies that were created during the "Normalize" micro-service are placed in a DIP folder and the METS file is added to the DIP.
+
|The access copies that were created during the "transcode" micro-service are placed in a DIP folder and the METS file is added to the DIP.
 
|-
 
|-
|Upload DIP
+
|uploadDIP
|The archivist reviews the DIP, if desired, and removes any access copies that cannot be sent to the public access system due to copyright, security or other issues. The archivist then approves the DIP for upload and the DIP is uploaded into the public access system (in Archivematica, the default access system is the open-source archival description tool ICA-AtoM). A backup copy of the DIP, including files that were deleted, is sent to the DIPbackups folder.
+
|The user may review the DIP and remove any access copies that cannot be sent to the public access system due to copyright, security or other issues. The user then approves the DIP for upload and the DIP is uploaded into the public access system (in Archivematica, the default access system is the open-source archival description tool ICA-AtoM). A backup copy of the DIP, including files that were deleted, is sent to the DIPbackups folder.
 
|-
 
|-
 
|}
 
|}

Revision as of 17:32, 20 February 2011

Main Page > Documentation > Technical Architecture > Micro-services


Micro-service.png

The Archivematica micro-services are granular system tasks which operate on a conceptual entity that is equivalent to an OAIS information package: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). The physical structure of an information package will include files, checksums, logs, XML metadata, etc..

These information packages are moved from one service to the next using the long-established Unix pipeline design pattern. Each micro-service is defined in a simple XML configuration file and associated with a watched directory. When an information package is moved to that directory it triggers the micro-service.

Each service is provided by a combination of Archivematica Python scrips and one or more open-source software tools bundled in the Archivematica system. Each micro-service results in a success or error state and the information package is moved accordingly to a success or error directory. Each success or error directory is the watched directory for a subsequent micro-service. This allows for the chaining of directories into complex, custom workflows. Archivematica implements a default ingest to access workflow that is compliant with the ISO-OAIS functional model.

Archivematica Micro-services

Micro-service Description
backupSIP Create a backup of the entire SIP as soon as it is ingested.
verifySIPcompliance Verify that the SIP conforms to the folder structure required for processing in Archivematica.
assignIdentifier Each file in the SIP is assigned a universal unique identifier and a sha-1 checksum for future integrity checks.
verifyChecksums If the ingested SIP already contains a checksum file, this micro-service will check it to confirm that none of the files were deleted or altered upon transfer to Archivematica.
createDublinCore If the ingested SIP does not already contain one, a Dublin Core xml template is added to the metadata folder in the SIP. The user can fill in fields as desired. These values are uploaded to the access system as part of the DIP processed by Archivematica.
appraiseForSubmission The user may review the SIP to confirm that it complies with any submission agreements. The archivist can delete unwanted files at this point; Archivematica will keep a log of the deleted files.
quarantine The SIP is placed in quarantine for a pre-set period of time. The user can move the SIP out of quarantine before the pre-set time has expired, if desired.
extractPackages Files are extracted from any .zip, .tar or other file package formats; each extracted file is assigned a universal unique identifier and a sha-1 checksum.
sanitizeNames Prohibited characters which may cause processing errors on known operating systems (e.g. spaces or ampersands) are removed from file and directory names and replaced with underscores.
virusScan ClamAV scans all files in the SIP. In the event that a virus or other malware is found, the SIP is placed in a folder called SIPerrors and all processing on the SIP is stopped.
validateFormatsAndExtractMetadata File formats are identified and the files validated against external format specifications. Technical metadata is extracted from the file.
appraiseForPreservation The user may appraise the contents of the SIP and delete unwanted files. Archivematica will keep a log of the deleted files.
transcode Transcode SIP files into a preservation format copy and an access format copy for each file according to its media type preservation plan. These are packaged along with the original file in the AIP.
compilePreservationMetadata Compile a METS file with a complete set of PREMIS metadata for each ingested file. The technical metadata that were extracted during the "Characterize and extract metadata" micro-service are placed in the PREMIS objectCharacteristicsExtension element.
createAIPchecksum Generate a checksum for all AIP contents.
prepareAIP Package AIP using the Library of Congress Bagit specification.
storeAIP The user may review the AIP and approve it for archival storage. The AIP is moved into the AIPsStore folder which is synced to the storage system.
generateDIP The access copies that were created during the "transcode" micro-service are placed in a DIP folder and the METS file is added to the DIP.
uploadDIP The user may review the DIP and remove any access copies that cannot be sent to the public access system due to copyright, security or other issues. The user then approves the DIP for upload and the DIP is uploaded into the public access system (in Archivematica, the default access system is the open-source archival description tool ICA-AtoM). A backup copy of the DIP, including files that were deleted, is sent to the DIPbackups folder.