Forensic imaging steps for 1.1

From Archivematica
Revision as of 15:51, 11 February 2020 by Sallain (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Digital forensics image ingest > Forensic imaging steps for 1.1

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

Archivematica 1.0 has changed the way that several processes work, which has changed the scope of what's necessary to implement for forensic disk imaging. This document outlines the mandatory steps that still need to be completed for forensic imaging in 1.1, and some additional steps that would let Archivematica generalize the functionality into the standard transfer.

Necessary improvements

Disk image imaging metadata must be able to be added

Forensic image transfers need to provide the ability to include some metadata at the beginning of the transfer.

This is partially implemented and needs to be rebased onto the 1.0 branch.

File identification commands must recognize disk images

Since the new extraction model is based on the FPR, and hence requires file identification, it will be necessary to ensure the identification microservices can identify disk images in order to allow them to be extracted.

Forensic tools must be packaged

Disk image extraction commands must be added to the FPR

Currently an extraction command using tsk_recover exists; this will allow sleuthkit-based images to be extracted. Other formats may be needed as well.

Being tracked in https://projects.artefactual.com/issues/5843

Extracted package deletion must be optional

Currently, whether or not extracted packages will be retained after decompression is hardcoded in the package extraction script, as it was in the old extraction code. (The current behaviour is to always delete the package after decompression.) This must be made optional via user choice in the UI, and should be exposed as a persistent option in the processing configuration.

Being tracked in https://projects.artefactual.com/issues/5894

Users should be offered the choice of whether to extract packages

This must be made optional via user choice in the UI, and should be exposed as a persistent option in the processing configuration.

Being tracked in https://projects.artefactual.com/issues/5894

New "Examine Contents" microservice must be added

Being tracked in https://projects.artefactual.com/issues/5880

This step runs the bulk_extractor tool and indexes the output to allow for later visualization and examination.

New characterization scripts must be written for fiwalk

Being tracked in https://projects.artefactual.com/issues/5866

It's previously been suggested that Archivematica use Mark Matienzo's fiwalk configuration that uses FIDO but this may no longer be necessary now that FIDO is implemented as a general identification tool - extracted contents will always be identifiable using FIDO if the user selects that as their identification tool.

Potential improvements

Alternate characterization tools should be implemented

Implemented in https://projects.artefactual.com/issues/5866

Disk image characterization should be done with fiwalk.

Currently the "characterize and extract metadata" step always uses FITS, but in 1.0 the groundwork was laid for allowing this to be controllable using the FPR instead. If this is completed, then we can simply write FPR rules to control characterization of disk images.

Provide robust identification fallbacks using additional microservice(s)

Currently identification happens using a single tool; if identification fails, the file will not be identified. We provide a single case fallback in the scripts that handle file identification and FIDO. Providing a more robust fallback would be desirable - e.g., by allowing individual files to fall back to other IDTools if identification fails. This would allow alternate tools to provide identification results for things that FIDO currently can't identify, such as disk images, without needing to clutter the existing scripts.

Recursive package extraction

The current package extraction code extracts in one pass. If a package contains additional packages that Archivematica can extract, they currently won't be extracted. The code should be updated in order to allow extraction of nested packages - for instance ZIP files containing other ZIPs; tarballs in uncommon compression formats (such as .tar.xz); and disk images containing compressed archives.