Forensic imaging steps for 1.1
Archivematica 1.0 has changed the way that several processes work, which has changed the scope of what's necessary to implement for forensic disk imaging. This document outlines the mandatory steps that still need to be completed for forensic imaging in 1.1, and some additional steps that would let Archivematica generalize the functionality into the standard transfer.
Disk image imaging metadata must be able to be added
Forensic image transfers need to provide the ability to include some metadata at the beginning of the transfer.
This is partially implemented and needs to be rebased onto the 1.0 branch.
File identification commands must recognize disk images
Since the new extraction model is based on the FPR, and hence requires file identification, it will be necessary to ensure the identification microservices can identify disk images in order to allow them to be extracted.
Forensic tools must be packaged
Disk image extraction commands must be added to the FPR
Extracted package deletion must be optional
Currently, whether or not extracted packages will be retained after decompression is hardcoded in the package extraction script, as it was in the old extraction code. (The current behaviour is to always delete the package after decompression.) This must be made optional via user choice in the UI, and should be exposed as a persistent option in the processing configuration.
Users should be offered the choice of whether to extract packages
This must be made optional via user choice in the UI, and should be exposed as a persistent option in the processing configuration.
New "Examine Contents" microservice must be added
This step runs the bulk_extractor tool and indexes the output to allow for later visualization and examination.
New characterization scripts must be written for fiwalk
It's previously been suggested that Archivematica use Mark Matienzo's fiwalk configuration that uses FIDO but this may no longer be necessary now that FIDO is implemented as a general identification tool - extracted contents will always be identifiable using FIDO if the user selects that as their identification tool.
Alternate characterization tools should be implemented
Disk image characterization should be done with fiwalk.
Currently the "characterize and extract metadata" step always uses FITS, but in 1.0 the groundwork was laid for allowing this to be controllable using the FPR instead. If this is completed, then we can simply write FPR rules to control characterization of disk images.
Provide robust identification fallbacks using additional microservice(s)
Currently identification happens using a single tool; if identification fails, the file will not be identified. We provide a single case fallback in the scripts that handle file identification and FIDO. Providing a more robust fallback would be desirable - e.g., by allowing individual files to fall back to other IDTools if identification fails. This would allow alternate tools to provide identification results for things that FIDO currently can't identify, such as disk images, without needing to clutter the existing scripts.
Recursive package extraction
The current package extraction code extracts in one pass. If a package contains additional packages that Archivematica can extract, they currently won't be extracted. The code should be updated in order to allow extraction of nested packages - for instance ZIP files containing other ZIPs; tarballs in uncommon compression formats (such as .tar.xz); and disk images containing compressed archives.