Difference between revisions of "Normalizing based on FITS output"

From Archivematica
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Normalizing based on FITS output
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Normalizing based on FITS output
  
Normalization based on file extension is problematic, since files having manually changed extensions or no extensions can't be normalized. An alternative to normalizing based on file extension is to have the normalization path based on FITS output, preferably the DROID/PRONOM format identification. However, FITS ouptut is not always reliable. This table identifies some filetypes that DROID has trouble identifying, and shows the tool output for fileUtility and exifTool for the problem files (since if the DROID identification is wrong or missing we may fall back on fileUtility or exifTool output).
+
This table shows DROID, FileUtility and exifTool output for file extensions for which Archivematica has preservation and access plans.
  
 
{| border="1" cellpadding="10" cellspacing="0"  
 
{| border="1" cellpadding="10" cellspacing="0"  
 
|-
 
|-
 
|- style="background-color:#cccccc;"
 
|- style="background-color:#cccccc;"
!style="width:10%"|'''Format'''
+
!style="width:10%"|'''Media type'''
!style="width:30%"|'''DROID identification'''
+
!style="width:10%"|'''Extension'''
!style="width:30%"|'''fileUtility output'''
+
!style="width:20%"|'''DROID identification'''
!style="width:30%"|'''exifTool output'''
+
!style="width:20%"|'''fileUtility output'''
 +
!style="width:20%"|'''exifTool output'''
 +
!style="width:20%"|'''Notes'''
 
|-
 
|-
 +
|Audio
 
|AC3
 
|AC3
 
|Format not identified
 
|Format not identified
 
|ATSC A/52 aka AC-3 aka Dolby Digital stream
 
|ATSC A/52 aka AC-3 aka Dolby Digital stream
 
|unknown file type
 
|unknown file type
 +
|
 
|-
 
|-
 +
|
 +
|AIFF
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|MP3
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|WAV
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|WMA
 +
|ASF
 +
|Microsoft ASF video (both)
 +
|mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
 +
|
 +
|-
 +
|Email
 +
|PST
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Office Open XML
 +
|DOCX
 +
|Microsoft Office Open XML (does not distinguish between them)
 +
|Zip archive data
 +
|mimetype: application/zip
 +
|
 +
|-
 +
|Office Open XML
 +
|PPTX
 +
|Microsoft Office Open XML (does not distinguish between them)
 +
|Zip archive data
 +
|mimetype: application/zip
 +
|
 +
|-
 +
|Office Open XML
 +
|XLSX
 +
|Microsoft Office Open XML (does not distinguish between them)
 +
|Zip archive data
 +
|mimetype: application/zip
 +
|
 +
|-
 +
|Portable Document Format
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Presentation
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Raster image
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Raw camera
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Spreadsheet
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Vector image
 
|AI
 
|AI
 
|PDF
 
|PDF
 
|PDF
 
|PDF
 
|mimetype: application/PDF
 
|mimetype: application/PDF
 +
|
 +
|-
 +
|Video
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|Word processing
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|
 +
|-
 
|-
 
|-
 
|DOCX, PPTX, XLSX
 
|DOCX, PPTX, XLSX
Line 25: Line 159:
 
|Zip archive data
 
|Zip archive data
 
|mimetype: application/zip
 
|mimetype: application/zip
 +
|
 
|-
 
|-
 
|EPS
 
|EPS
Line 30: Line 165:
 
|PostScript document text conforming DSC level 3.1, type EPS, Level 2
 
|PostScript document text conforming DSC level 3.1, type EPS, Level 2
 
|mimetype: application/postscript
 
|mimetype: application/postscript
 +
|
 
|-
 
|-
 
|NEF
 
|NEF
Line 35: Line 171:
 
|TIFF
 
|TIFF
 
|mimetype: image/x-raw
 
|mimetype: image/x-raw
 +
|
 
|-
 
|-
 
|WMA and WMV
 
|WMA and WMV
Line 40: Line 177:
 
|Microsoft ASF video (both)
 
|Microsoft ASF video (both)
 
|mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
 
|mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
 +
|
 
|-
 
|-
 
|XLS
 
|XLS
Line 45: Line 183:
 
|Microsoft Office Document application
 
|Microsoft Office Document application
 
|mimetype: application/vnd.ms-excel
 
|mimetype: application/vnd.ms-excel
 +
|
 
|-
 
|-
 
|}
 
|}

Revision as of 11:30, 4 July 2011

Main Page > Development > Development documentation > Normalizing based on FITS output

This table shows DROID, FileUtility and exifTool output for file extensions for which Archivematica has preservation and access plans.

Media type Extension DROID identification fileUtility output exifTool output Notes
Audio AC3 Format not identified ATSC A/52 aka AC-3 aka Dolby Digital stream unknown file type
AIFF
MP3
WAV
WMA ASF Microsoft ASF video (both) mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
Email PST
Office Open XML DOCX Microsoft Office Open XML (does not distinguish between them) Zip archive data mimetype: application/zip
Office Open XML PPTX Microsoft Office Open XML (does not distinguish between them) Zip archive data mimetype: application/zip
Office Open XML XLSX Microsoft Office Open XML (does not distinguish between them) Zip archive data mimetype: application/zip
Portable Document Format
Presentation
Raster image
Raw camera
Spreadsheet
Vector image AI PDF PDF mimetype: application/PDF
Video
Word processing
DOCX, PPTX, XLSX Microsoft Office Open XML (does not distinguish between them) Zip archive data mimetype: application/zip
EPS Format not identified PostScript document text conforming DSC level 3.1, type EPS, Level 2 mimetype: application/postscript
NEF TIFF TIFF mimetype: image/x-raw
WMA and WMV Both identified as ASF (correct, but not granular enough) Microsoft ASF video (both) mimetype: audio/x-ms-wma, mimetype: video/x-ms-wmv
XLS OLE2 Compound Document Format Microsoft Office Document application mimetype: application/vnd.ms-excel