Word processing files

From Archivematica
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Main Page > Documentation > Format policies > Word processing files


Significant characteristics of word processing files

Preservation Format

  • Open Document Format (WPD)
  • Original format for DOC (starting in Archivematica 0.8)
  • Keep as RTF (RTF)

Access Format

PDF/A

Normalization tool

Unoconv/OpenOffice Writer Tool search in progress

Comments

  • Unoconv is used as a command-line tool to open OpenOffice, which performs conversions to both ODF and PDF/A
  • The files are converted to ODT, which is the OpenOffice extension for Open Document Format
  • OOXML and DOC files are left in their original format owing to their ubiquity and ongoing support by Microsoft
  • Normalization to Portable Document Format/Archival (PDF/A) may be an acceptable preservation strategy in addition to normalization to ODF using unoconv and OpenOffice.
    • For more information on the PDF/A format see Library of Congress Sustainability of Digital Formats: PDF/A-1.
    • PDF/A normalization of MS Word files is somewhat problematic because best results are achieved from within the native application - i.e. MS Office running in MS Windows. Archivematica does not support either Windows or MS Office since these are proprietary software packages.
  • Rich Text Format is still heavily used and can be opened in a large number of software programs. Although it is proprietary, it has a published, freely available format specification. In contrast, we'll continue to normalize wordperfect files to .odt because there is no published spec for the wordperfect format.