Transcoder

From Archivematica
Revision as of 11:38, 8 March 2011 by Misty De Meo (talk | contribs) (→‎transcoderConfig.conf: Added additional editing documentation)
Jump to navigation Jump to search

Transcode: convert (language or information) from one form of coded representation to another.[ source: Oxford English Dictionary ]

Overview

The transcoder is developed by artefactual, for the purpose of normalization and generating access copies in the archivematica system. In earlier versions it was called normalizer. It will try to identify the file type by the file extension, or other metadata, and look for matching configured actions for those identified. It will then perform those actions, and exit with a zero status if it believes those actions have been completed successfully.

Development

Presently to manage the complexity of automating the link between file identification and actions, a database based implementation of the transcoder is being built to replace the current xml one.

Configuration

Configuration files are located in the /etc/transcoder/ directory.

transcoderConfig.conf

transcoderConfig.conf is the primary transcoder configuration file. It is a bash script which defines the variables used in the various file format policy XML files; it primarily contains paths to conversion tools and standard file names.

Variables are stored as standard bash shell script variables. Variables can be added or edited using any text editor; any new variables added become available for use in format policy XML files. They use the format:

variableName="variable contents"

Default variables:

Variable Description Default value
formatPoliciesPath Directory containing format policy XML files /etc/transcoder/archivematicaFormatPolicies/
transcoderScriptsDir Directory containing transcoder normalization scripts /usr/lib/transcoder/transcoderScripts/
convertPath Path to ImageMagick for image conversion. Requires a space at the end. /usr/bin/convert
ffmpegPath Path to ffmpeg for audio and video. Requires a space at the end. /usr/bin/ffmpeg
theoraPath Path to ffmpeg2theora script to create Ogg Theora and Vorbis files. Currently unused. Requires a space at the end. /usr/bin/ffmpeg2theora
unoconvPath Path to unoconv binary for converting document files. Currently unused. Requires a space at the end. /usr/bin/unoconv
unoconvAlternatePath Path to unoconv launcher script for converting document files. Requires a space at the end. /usr/lib/transcoder/transcoderScripts/unoconvAlternative.sh
DublinCore File name for Dublin Core metadata dublincore.xml
MD5FileName File name containing SIP MD5 checksum MD5checksum.txt
fileUUIDHumanReadable Log file containing unique IDs for items within a SIP FileUUIDs.log