Difference between revisions of "File Browser Requirements"

From Archivematica
Jump to navigation Jump to search
Line 48: Line 48:
= More to come on this...=
= More to come on this...=
*full-text indexing (Tika/Lucene/ElasticSearch) on recognizable text files as well as data visualization (ElasticSearch/Protovis) to assist with:
* See Transfer indexing requirements [[Transfer_and_SIP_creation#Transfer_indexing_requirements_0.9_and_beyond]]
        identifying keywords for security classification and appraisal
* See Issue 924
        identify duplicates and close matches (using hex values?)
        high-level sorting and grouping of batches of objects into SIPs (incl. tracking difference between 'original order' and new physical/logical arrangement)
*Archivematica compliant SIP creation, including:
*Archivematica compliant SIP creation, including:
Line 60: Line 58:
* keyword & pattern matching for privacy/security sensitive information (e.g. social insurance numbers, credit card numbers, 'private', 'confidential')
* list of PDFs that have not been OCR'ed
* list of password protected / encrypted files
[[Category:Development documentation]]
[[Category:Development documentation]]

Revision as of 15:08, 17 May 2012

Main Page > Development > Development documentation > File Browser Requirements

This page describes requirements for moving all Archivematica interface to the web dashboard, including file browser functionality for transfer and SIP configuration, normalization output review, and AIP and DIP review prior to upload to storage and access. (mockups forthcoming)


  • Login Archivist - archivist's name will be attached to all processing actions. (PREMIS agent)
  • Create structured directory (create three directories: logs, metadata, objects)


  • Restructure transfer for processing (put all contents into objects directory and create an empty log and metadata directory) (This should be automated)
  • Run checksum on transfer and place checksum in folder called checksum.md5 in the metadata folder (checkbox) This is in case the transfer hasn't arrived with a checksum already. (This was discussed - currently, the checksum assigment and checksum check should occur upon beginning transfer, prior to backup. Process should be automated.) In fact, all transfers will go through all microservices for transfers before they are "backlogged"
  • Backlog should be indexed
  • Assign transfer type (drop-down, user configurable: Generic, DSpace export, digitization output (Issue 713), VanDocs export, BagIt package (Issue 593))

0.9 Transfer.png

Transfer backup requirements: See Transfer_Backup_Requirements

Create SIP

0.9 CreateSIP.png

  • Below the above contents, on the same page, would be the current 0.8 dashboard for Ingest, including popouts for tasks, etc, and decision points.
  • original order and arrangement (issue 964) captured as logical <structMap> in SIP METS file.
  • Report of actions in Create SIP is auto-generated

File Viewer

  • This will allow the user to see individual documents in the transfer to get a better idea of their contents and technical metadata before assigning them to SIPs.
  • Viewers are browser-dependent; viewer option is greyed out if viewer is not supported in browser
  • Examine Contents window allows for viewing technical MD and other metadata available after Transfer microservices as well as indexing MD

Examine Contents

  • Opens in new tab
  • This will allow the user to examine contents of a SIP for keywords, use visualization tools, and identify restricted records (by keyword or by type, cc number, SIN number, etc), tag records or groups of records and apply some basic metadata that will be carried over to the description


  • Assign transfer backup location (what is it backing up? at which stage?) This could be a staging area, in case many transfers will become one SIP or more time is needed to begin processing ADMIN
  • Go to storage reports in DB on individual storage locations - see Issue 882
  • Transfer store = Backlog ; MD: name, accession #, uuid, transfer type, archivist, size of contents, number of files, date, file types? and other indexing results?, any reports? (FITS, clamAV, etc)
    • Accessions managed in ICA-AtoM or other content management system, linked in DB and in tab

0.9 AdministrationTab.png

More to come on this...

  • Archivematica compliant SIP creation, including:
       log of original directory structure, diff to new structure (using METS <structMap> to represent both)
       use METS <structMap> (or physical directory arrangement of SIP?) to represent archival arrangement for rebuild in access system (see Issue 380)
  • Change permissions of directory contents recursively (or should this be integrated in every step to avoid problems?)