Significant characteristics of word processing files
Jump to navigation
Jump to search
Main Page > Documentation > Format policies > Significant characteristics > Significant characteristics of word processing files
- "[T]he essential characteristics of a word processing document may include the textual content; formatting such as bolded text, font type and size; layout; bulleting; colour and embedded graphics." An Approach to the Preservation of Digital Records, National Archives of Australia, 2002
- Document Metadata: Document Technical Metadata for Digital Preservation, Florida Digital Archive and Harvard University Library, 2009: this document suggests technical metadata for textual records which "can be used to verify the result of document transformations, ensuring the properties of the original document are preserved and properly transformed to the new document format." The metadata in this table are adapted from that source:
Semantic unit | Description | Obligation | Characteristic | Note |
---|---|---|---|---|
PageCount | Total number of pages in the document | Mandatory | Structure | |
WordCount | Total number of words in the document | Optional | Structure | This element is included in this schema because it can be valuable for evaluating the completeness of the content after transformations. Caution must be used with this element, however, because tools and applications that can determine the number of words in a document do not always use the same algorithm for determining this value. |
CharacterCount | Total number of characters in the document | Optional | Structure | See note for WordCount, above |
ParagraphCount | Total number of paragraphs in the document | Optional | Structure | See note for WordCount, above |
LineCount | Total number of lines in the document | Optional | Structure | See note for WordCount, above |
TableCount | Total number of tables in the document | Optional | Structure | See note for WordCount, above |
GraphicsCount | Total number of graphics in the document | Optional | Structure | See note for WordCount, above |
Language | A language identifier specifying the natural language used in the document | Optional | Content | |
Fonts (FontName, isEmbedded) | A list of fonts used in the document; An indication of whether or not a font is embedded in a document | Mandatory | Content, Appearance | This element allows a repository to store the names of all fonts used in a document. Some repositories may choose to store only the non-embedded fonts. It is recommended that repositories record at least the non-embedded fonts to assist in identifying the documents with potential long-term preservation risks. |
Features | Additional document features as follows: hasLayers, hasTransparency, hasOutline, hasForms, has Annotations | Optional |
|