Difference between revisions of "Normalizing based on FITS output"

From Archivematica
Jump to navigation Jump to search
Line 30: Line 30:
 
|TIFF
 
|TIFF
 
|mimetype: image/x-raw
 
|mimetype: image/x-raw
 +
|-
 +
|DOCX, PPTX, XLSX
 +
|Microsoft Office Open XML (does not distinguish between them)
 +
|Zip archive data
 +
|mimetype: application/zip
 
|-
 
|-
 
|WMA and WMV
 
|WMA and WMV

Revision as of 12:04, 22 March 2011

Main Page > Development > Development documentation > Normalizing based on FITS output

Normalization based on file extension is problematic, since files having manually changed extensions or no extensions can't be normalized. An alternative to normalizing based on file extension is to have the normalization path based on FITS output, preferably the DROID/PRONOM format identification. However, FITS ouptut is not always reliable. This table identifies some filetypes that DROID has trouble identifying, and shows the tool output for fileUtility and exifTool for the problem files (since if the DROID identification is wrong or missing we may fall back on fileUtility or exifTool output).

Format DROID identification fileUtility output exifTool output
AC3 Format not identified ATSC A/52 aka AC-3 aka Dolby Digital stream unknown file type
AI PDF PDF mimetype: application/PDF
EPS Format not identified PostScript document text conforming DSC level 3.1, type EPS, Level 2 mimetype: application/postscript
NEF TIFF TIFF mimetype: image/x-raw
DOCX, PPTX, XLSX Microsoft Office Open XML (does not distinguish between them) Zip archive data mimetype: application/zip
WMA and WMV Both identified as ASF (correct, but not granular enough) Microsoft ASF video (both) mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
XLS OLE2 Compound Document Format Microsoft Office Document application mimetype: application/vnd.ms-excel