Significant characteristics of word processing files

From Archivematica
Revision as of 16:35, 11 February 2020 by Sallain (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Main Page > Documentation > Format policies > Significant characteristics > Significant characteristics of word processing files

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.


Semantic unit Description Obligation Characteristic Note
PageCount Total number of pages in the document Mandatory Structure
WordCount Total number of words in the document Optional Structure This element is included in this schema because it can be valuable for evaluating the completeness of the content after transformations. Caution must be used with this element, however, because tools and applications that can determine the number of words in a document do not always use the same algorithm for determining this value.
CharacterCount Total number of characters in the document Optional Structure See note for WordCount, above
ParagraphCount Total number of paragraphs in the document Optional Structure See note for WordCount, above
LineCount Total number of lines in the document Optional Structure See note for WordCount, above
TableCount Total number of tables in the document Optional Structure See note for WordCount, above
GraphicsCount Total number of graphics in the document Optional Structure See note for WordCount, above
Language A language identifier specifying the natural language used in the document Optional Content
Fonts (FontName, isEmbedded) A list of fonts used in the document; An indication of whether or not a font is embedded in a document Mandatory Content, Appearance This element allows a repository to store the names of all fonts used in a document. Some repositories may choose to store only the non-embedded fonts. It is recommended that repositories record at least the non-embedded fonts to assist in identifying the documents with potential long-term preservation risks.
Features Additional document features as follows: hasLayers, hasTransparency, hasOutline, hasForms, has Annotations Optional
  • hasLayers: appearance
  • hasTransparency: appearance
  • hasOutline: behaviour, appearance
  • hasForms: content
  • hasAnnotations: content