User:Mdemeo/Characterization

From Archivematica
Jump to navigation Jump to search

Characterization is the process of producing technical metadata for an object. Archivematica's characterization aims both to document the object's significant properties, and to extract technical metadata contained within the object.

Prior to Archivematica 1.2, the characterization microservice always ran the FITS tool. As of Archivematica 1.2, characterization is now fully customizeable by the Archivematica administrator.

Characterization tools[edit]

Archivematica 1.2 ships with four characterization tools. Which tool will run on a given file depends on the type of file, as determined by Archivematica's identification tool.

Default[edit]

The default characterization tool is FITS; it will be used if no specific characterization rule exists for the file being scanned.

It is possible to create new default characterization commands, which can either replace FITS or run alongside it on every file.

Multimedia[edit]

Archivematica 1.2 introduces three new multimedia characterization tools. These tools were selected for their rich metadata extraction, as well as for their speed. Depending on the type of the file being scanned, one or more of these tools may be called instead of FITS.

  • FFprobe, a characterization tool built on top of the same core as FFmpeg, the normalization software used by Archivematica
  • MediaInfo, a characterization tool oriented towards audio and video data
  • ExifTool, a characterization tool oriented towards still image data and extraction of embedded metadata

Writing a new characterization command[edit]

Information on writing new characterization commands can be found in the FPR administrator's manual.

Writing a characterization command is very similar to writing an identification command or a normalization command. Like an identification command, a characterization command is designed to run a tool and produce output to standard out. Output from characterization commands is expected to be valid XML, and will be included in the AIP's METS document within the file's <objectCharacteristicsExtension> element.

When creating a characterization command, the "output format" should be set to "XML 1.0".