Transcoder
Transcode: convert (language or information) from one form of coded representation to another.[ source: Oxford English Dictionary ]
Overview
The transcoder is developed by artefactual, for the purpose of normalization and generating access copies in the archivematica system. In earlier versions it was called normalizer. It will try to identify the file type by the file extension, or other metadata, and look for matching configured actions for those identified. It will then perform those actions, and exit with a zero status if it believes those actions have been completed successfully.
Transcoder Database
In Archivematica release 0.7.1 alpha, the normalalization rules have been moved to a database, and can be seen under the preservation planning tab on the dashboard. In future releases, we plan to support modification of these rules through the dashboard interface.
Database Schema
Configuration
Configuration files are located in the /etc/transcoder/ directory.
The transcoder database credentials and server can be set in the dbsettings.conf file.
Development
In the 0.9 release the transcoder was integrated with the MCP.
During transfer processing, the fileIDs are identified by microservices. They are stored against the file in the FilesIDentifiedIDs table.
For normalization processing, the MCP will process down a chain for each file. The job for normalization of a file will check for command relationships with the identified file id's and the proper command classification (normalize preservation, normalize access). For every unique command found in that relaitonship, the MCP will create a task to be executed by the client. If no commands are identified the MCP will create a task with the default command, from the DefaultCommandsForClassifcations table, if one is defined.
Integration with the MCP was done by relating commands to Microservice chain links. The Transcoder links (MicroserviceDChainLinks of this type) have a one to one relationship with the tasksConfigs, which have a one to one relationship with the CommandRelationships. The protocol between the client and server is based on the command Relationships's pk. The MCP assigns a task to the client to perform x commandRelationship on y file (identified by fileUUID). The client can pull the information required to execute the command from the database.
Why the change? To support all clients not having to support all normalization tools, the tasks needed to be assigned by tool availibility. Currently the archivematica-client package depends on all the tools required, but there are situations where this will be required. While these are not currently implemented an example would be normalizing on a windows machine, using microsoft office. The windows machine could theoretically run a client, but it wouldn't be able to support the standard archivematica tools, as they are linux based. To differentiate the two, use the supportedBy field in the Commands table.
Example
Some stuff
Click to expand the entire example as a whole.
-- Commands for handling Video files -- INSERT INTO Commands (commandType, command, description) -- VALUES SELECT pk FROM FileIDS WHERE description = 'Normalize Defaults' VALUES ( (SELECT pk FROM CommandTypes WHERE type = 'bashScript'), ('echo program=\\"ffmpeg\\"\\; version=\\"`ffmpeg 2>&1 | grep \"FFmpeg version\"`\\"'), ('Get event detail text for ffmpeg extraction') ); set @ffmpegEventDetailCommandID = LAST_INSERT_ID(); INSERT INTO Commands (commandType, command, outputLocation, eventDetailCommand, verificationCommand, description) VALUES ( (SELECT pk FROM CommandTypes WHERE type = 'bashScript'), ('ffmpeg -i "%fileFullName%" -vcodec libx264 -preset medium -crf 18 "%outputDirectory%%prefix%%fileName%%postfix%.mp4"'), '%outputDirectory%%prefix%%fileName%%postfix%.mp4', @ffmpegEventDetailCommandID, @standardVerificationCommand, ('Transcoding to mp4 with ffmpeg') ); set @ffmpegToMP4CommandID = LAST_INSERT_ID(); INSERT INTO Commands (commandType, command, outputLocation, eventDetailCommand, verificationCommand, description) VALUES ( (SELECT pk FROM CommandTypes WHERE type = 'bashScript'), ('#!/bin/bash # This file is part of Archivematica. # # Copyright 2010-2012 Artefactual Systems Inc. <http://artefactual.com> # # Archivematica is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # Archivematica is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with Archivematica. If not, see <http://www.gnu.org/licenses/>. # @package Archivematica # @subpackage transcoder # @author Joseph Perry <joseph@artefactual.com> # @version svn: $Id$ inputFile="%fileFullName%" outputFile="%outputDirectory%%prefix%%fileName%%postfix%.mkv" audioCodec="pcm_s16le" videoCodec="ffv1" audioStreamCount=`ffprobe "${inputFile}" -show_streams 2>&1 | grep "codec_type=audio" -c` videoStreamCount=`ffprobe "${inputFile}" -show_streams 2>&1 | grep "codec_type=video" -c` command="ffmpeg -i \"${inputFile}\" " if [ ${audioStreamCount} -ge 1 ] ; then command="${command} -vcodec ${videoCodec} " fi if [ ${videoStreamCount} -ge 1 ] ; then command="${command} -acodec ${audioCodec}" fi command="${command} ${outputFile}" addAudioStream=" -acodec ${audioCodec} -newaudio" addVideoStream=" -vcodec ${videoCodec} -newvideo" #add additional audio channels for (( c=1; c<${audioStreamCount}; c++ )); do command="${command} ${addAudioStream}" #echo $command done for (( c=1; c<${videoStreamCount}; c++ )); do command="${command} ${addVideoStream}" #echo $command done echo $command eval $command '), '%outputDirectory%%prefix%%fileName%%postfix%.mkv', @ffmpegEventDetailCommandID, @standardVerificationCommand, ('Transcoding to mkv with ffmpeg') ); set @ffmpegToMKVCommandID = LAST_INSERT_ID(); -- End of Commands for handling Video files -- -- ADD Normalization Path for .MPEG -- INSERT INTO FileIDs (description, validPreservationFormat, validAccessFormat) VALUES ( 'A .mpeg file', FALSE, FALSE ); set @fileID = LAST_INSERT_ID(); INSERT INTO FileIDsByExtension (Extension, FileIDs) VALUES ( 'mpeg', @fileID ); INSERT INTO FileIDGroupMembers (fileID, groupID) VALUES (@fileID, @videoGroup); INSERT INTO CommandRelationships (GroupMember, commandClassification, command, fileID) VALUES ( @fileIDByExtensionDefaultGroupMemberID, (SELECT pk FROM CommandClassifications WHERE classification = 'preservation'), @ffmpegToMKVCommandID, @fileID ); INSERT INTO TasksConfigs (taskType, taskTypePKReference, description) VALUES (8, LAST_INSERT_ID(), 'Normalize preservation'); INSERT INTO MicroServiceChainLinks (microserviceGroup, currentTask, defaultNextChainLink) VALUES (@microserviceGroup, LAST_INSERT_ID(), @defaultPreservationNormalizationFailedLink); set @MicroServiceChainLink = LAST_INSERT_ID(); INSERT INTO MicroServiceChainLinksExitCodes (microServiceChainLink, exitCode, nextMicroServiceChainLink) VALUES (@MicroServiceChainLink, 0, @defaultPreservationNormalizationSucceededLink); INSERT INTO CommandRelationships (GroupMember, commandClassification, command, fileID) VALUES ( @fileIDByExtensionDefaultGroupMemberID, (SELECT pk FROM CommandClassifications WHERE classification = 'access'), @ffmpegToMP4CommandID, @fileID ); INSERT INTO TasksConfigs (taskType, taskTypePKReference, description) VALUES (8, LAST_INSERT_ID(), 'Normalize access'); INSERT INTO MicroServiceChainLinks (microserviceGroup, currentTask, defaultNextChainLink) VALUES (@microserviceGroup, LAST_INSERT_ID(), @defaultAccessNormalizationFailedLink); set @MicroServiceChainLink = LAST_INSERT_ID(); INSERT INTO MicroServiceChainLinksExitCodes (microServiceChainLink, exitCode, nextMicroServiceChainLink) VALUES (@MicroServiceChainLink, 0, @defaultAccessNormalizationSucceededLink); -- End Of ADD Normalization Path for .MPEG --
Future Development
We are considering building a Format_policy_registry.