Transcoder
Transcode: convert (language or information) from one form of coded representation to another.[ source: Oxford English Dictionary ]
Overview[edit]
The transcoder is developed by artefactual, for the purpose of normalization and generating access copies in the archivematica system. In earlier versions it was called normalizer. It will try to identify the file type by the file extension, or other metadata, and look for matching configured actions for those identified. It will then perform those actions, and exit with a zero status if it believes those actions have been completed successfully.
Transcoder Database[edit]
In Archivematica release 0.7.1 alpha, the normalalization rules have been moved to a database, and can be seen under the preservation planning tab on the dashboard. In future releases, we plan to support modification of these rules through the dashboard interface.
Database Schema[edit]
Configuration[edit]
Configuration files are located in the /etc/transcoder/ directory.
The transcoder database credentials and server can be set in the dbsettings.conf file.
Development[edit]
In the 0.9 release the transcoder was integrated with the MCP.
During transfer processing, the fileIDs are identified by microservices. They are stored against the file in the FilesIDentifiedIDs table.
For normalization processing, the MCP will process down a chain for each file. The job for normalization of a file will check for command relationships with the identified file id's and the proper command classification (normalize preservation, normalize access). For every unique command found in that relaitonship, the MCP will create a task to be executed by the client. If no commands are identified the MCP will create a task with the default command, from the DefaultCommandsForClassifcations table, if one is defined.
Integration with the MCP was done by relating commands to Microservice chain links. The Transcoder links (MicroserviceChainLinks of this type) have a one to one relationship with the tasksConfigs, which have a one to one relationship with the CommandRelationships. The protocol between the client and server is based on the command Relationships's pk. The MCP assigns a task to the client to perform x commandRelationship on y file (identified by fileUUID). The client can pull the information required to execute the command from the database.
Why the change? To support all clients not having to support all normalization tools, the tasks needed to be assigned by tool availibility. Currently the archivematica-client package depends on all the tools required, but there are situations where this will be required. While these are not currently implemented an example would be normalizing on a windows machine, using microsoft office. The windows machine could theoretically run a client, but it wouldn't be able to support the standard archivematica tools, as they are linux based. To differentiate the two, use the supportedBy field in the Commands table.
Example[edit]
Normalization commands are created as part of the archivematica install. They are kept in the database, and populated upon install by the /usr/share/archivematica/transcoder/mysql sql script.
Create Commands[edit]
First, create the command(s) that will need to run. These commands can even be complete scripts. The command Type will need to be defined. A list of supported command types is in the CommandTypes table. You may also wish to create a special command for getting the event detail text for the event.
See code:
Create FileIds[edit]
Second, create the file type. The FileIDs entry is a cover all, for future releases supporting more than one type of file identification. Every file identificaiton will have a unique corresponding entry in the FileIDs. The validPreservationFormat, and validAccessFormat relate to what appears in the normalization report. These are for identifying files at risk of format obsolescence, with failed or no normalization command. The FileIDsByExtension is the entry that links a '.mpg' file to the fileID. Files are related to their extension fileID's in the 'Identify Files ByExtension' micro-service, creating an entry in the FilesIdentifiedIDs table.
See code:
Create relationship between command and fileID[edit]
Third, create the relationship between the command and the file identification format. The relationship will play a role (preservation, or access) defined in the commandClassication. It's important to note the fileID references the FileIDs table, not the FileIDsByExtension table. The commandClassification was part of some testing of prioritizing normalization commands based on file identifcation types (using more than one file identification method); it's default value is @fileIDByExtensionDefaultGroupMemberID (0), even if left undefined.
See code:
Create processing link to execute[edit]
Lastly, create the MicroServiceChainLink to be processed by the MCP, containing relationship between the link and the CommandRelationship.
See code:
Complete Code[edit]
To see all of the code as one.
Click to expand the entire example as a whole.
Future Development[edit]
We are considering building a Format_policy_registry.