Normalizing based on FITS output

From Archivematica
Revision as of 19:25, 21 March 2011 by Evelyn McLellan (talk | contribs)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Normalizing based on FITS output

Normalization based on file extension is problematic, since files having manually changed extensions or no extensions can't be normalized. An alternative to normalizing based on file extension is to have the normalization path based on FITS output, preferably the DROID/PRONOM format identification. However, FITS ouptut is not always reliable. This table identifies some filetypes that DROID has trouble identifying, and shows the tool output for fileUtility and exifTool for the problem files (since if the DROID identification is wrong or missing we may fall back on fileUtility or exifTool output).

Format DROID identification fileUtility output exifTool output
AC3 Format not identified ATSC A/52 aka AC-3 aka Dolby Digital stream
AI PDF PDF mimetype: application/PDF
EPS Format not identified PostScript document text conforming DSC level 3.1, type EPS, Level 2 mimetype: application/postscript
NEF TIFF TIFF mimetype: image/x-raw
WMA and WMV Both identified as ASF (correct, but not granular enough) Microsoft ASF video (both) mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
XLS OLE2 Compound Document Format Microsoft Office Document application mimetype: application/vnd.ms-excel