Difference between revisions of "Word processing files"

From Archivematica
Jump to navigation Jump to search
 
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Main Page]] > [[Documentation]] > [[Media type preservation plans]] > Word processing files
+
[[Main Page]] > [[Documentation]] > [[Format policies]] > Word processing files
  
  
==[[Significant properties of word processing files]]==
+
==[[Significant characteristics of word processing files]]==
  
 
==Preservation Format==
 
==Preservation Format==
Open Document Format; PDF/A
+
*Open Document Format (WPD)
 +
*Original format for DOC (starting in Archivematica 0.8)
 +
*Keep as RTF (RTF)
  
 
==Access Format==
 
==Access Format==
PDF
+
PDF/A
  
 
==Normalization tool==
 
==Normalization tool==
Xena or OpenOffice Writer
+
<strike> Unoconv/OpenOffice Writer </strike>
 +
Tool search in progress
 +
 
 
==Comments==
 
==Comments==
*For ODF conversion, Xena may be preferable since it also normalizes embedded objects such as image files and spreadsheets.
+
*Unoconv is used as a command-line tool to open OpenOffice, which performs conversions to both ODF and PDF/A
*PDF/A normalization of MS Word files is somewhat problematic because best results are achieved from within the native application - i.e. MS Office running in MS Windows. Archivematica does not support either Windows or MS Office since these are proprietary software packages.
+
*The files are converted to ODT, which is the OpenOffice extension for Open Document Format
 
+
*OOXML and DOC files are left in their original format owing to their ubiquity and ongoing support by Microsoft
 +
*Normalization to Portable Document Format/Archival (PDF/A) may be an acceptable preservation strategy in addition to normalization to ODF using unoconv and OpenOffice.
 +
**For more information on the PDF/A format see [http://www.digitalpreservation.gov/formats/fdd/fdd000125.shtml Library of Congress Sustainability of Digital Formats: PDF/A-1].
 +
**PDF/A normalization of MS Word files is somewhat problematic because best results are achieved from within the native application - i.e. MS Office running in MS Windows. Archivematica does not support either Windows or MS Office since these are proprietary software packages.
 +
*Rich Text Format is still heavily used and can be opened in a large number of software programs. Although it is proprietary, it has a published, freely available format specification. In contrast, we'll continue to normalize wordperfect files to .odt because there is no published spec for the wordperfect format.
 
__NOTOC__
 
__NOTOC__

Latest revision as of 13:37, 26 November 2013

Main Page > Documentation > Format policies > Word processing files


Significant characteristics of word processing files[edit]

Preservation Format[edit]

  • Open Document Format (WPD)
  • Original format for DOC (starting in Archivematica 0.8)
  • Keep as RTF (RTF)

Access Format[edit]

PDF/A

Normalization tool[edit]

Unoconv/OpenOffice Writer Tool search in progress

Comments[edit]

  • Unoconv is used as a command-line tool to open OpenOffice, which performs conversions to both ODF and PDF/A
  • The files are converted to ODT, which is the OpenOffice extension for Open Document Format
  • OOXML and DOC files are left in their original format owing to their ubiquity and ongoing support by Microsoft
  • Normalization to Portable Document Format/Archival (PDF/A) may be an acceptable preservation strategy in addition to normalization to ODF using unoconv and OpenOffice.
    • For more information on the PDF/A format see Library of Congress Sustainability of Digital Formats: PDF/A-1.
    • PDF/A normalization of MS Word files is somewhat problematic because best results are achieved from within the native application - i.e. MS Office running in MS Windows. Archivematica does not support either Windows or MS Office since these are proprietary software packages.
  • Rich Text Format is still heavily used and can be opened in a large number of software programs. Although it is proprietary, it has a published, freely available format specification. In contrast, we'll continue to normalize wordperfect files to .odt because there is no published spec for the wordperfect format.