OCR text in DIP

Requirements

See related issues:

Add open-source OCR tool to Archivematica
- Tesseract is the most actively developed; other open-source OCR is either moribund (Cuneiform, OCRopus) or released only sporadically without major improvements (OCRad, gOCR).
- The actively-developed options are:
  - Tesseract - Very actively developed. Best accuracy of all the open-source options, good speed.
  - OCRad - Moderate-to-poor accuracy, excellent speed.
  - gOCR - Poor accuracy, slow speed.
- Tesseract appears to be the best solution in most cases. If speed was paramount and mediocre or poor accuracy would be acceptable, there might be an argument to use OCRad.
Add micro-service to OCR files in DIP (post-normalization)
- Micro-service: Transcription
- User choice Yes/No (default NO)
Add FPR purpose - Transcription - OCR first and only tool in that section
Add OCR files to DIP in "[DIP]/OCRfiles" directory
Run OCR on originals or preservation copies??
Add OCR files to AIP in "AIP/data/objects/metadata/OCRfiles"
- METS file PREMIS event Transcription (add to PREMIS events)
- METS file : use text/ocr fileGrp in https://www.archivematica.org/wiki/METS#.3CfileSec.3E
Add configuration setting to administrative tab of the dashboard to pre-select OCR options