Difference between revisions of "Transfer backlog requirements"

From Archivematica
Jump to navigation Jump to search
m (Move to feature requirements category)
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Transfer backlog requirements
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Transfer backlog requirements
  
Release 1.0
+
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information.</div><p>
  
 
[[Category:Feature requirements]]
 
[[Category:Feature requirements]]
  
== Transfer Backlog Management ==
+
== Proposed improvements ==
 +
 
 +
=== Handle cross-pipeline backlog ===
 +
 
 +
March 2017
 +
 
 +
'''Summary''': Transfers put into backlog should be able to be started as SIP (through the Appraisal tab, SIP arrange, or Backlog tab) on a different pipeline than they were put in backlog from.
 +
 
 +
'''Problem''': Much of the information needed during Ingest is stored only in the database. If a Transfer is put into backlog and ingest is started on a different pipeline that information is not there.
 +
 
 +
'''Proposed fix''':  The Transfer METS file should be improved to contain all information needed, which could be parsed back into the database at the start of ingest.
 +
 
 +
Currently there are 4 places where backlogged transfer data lives: pipeline database, Elasticsearch transfers index, storage service database & transfer METS file. This should be consolidated to a single location and the other sources could be rebuilt from the canonical one.
 +
 
 +
The METS file is treated as the canonical metadata store in the AIP and is a good choice for the canonical metadata store for a transfer in backlog. Since all of this information is needed in ingest is for the AIP METS file, we know all the information can be stored as a METS file.  We already parse the AIP METS file into the database on full reingest, so there is precedent & examples for doing this.
 +
 
 +
Tables needed include:
 +
* Transfers
 +
* Files
 +
* DublinCore
 +
* RightsStatement & related
 +
* Events
 +
* Events_agents (link between Event & Agent)
 +
* Agents (older version of AM?)
 +
* FilesIdentifiedIDs (file ID info)
 +
* FilesIDs (more file ID info)
 +
* main_fpcommandoutput (characterization)
 +
* Jobs or Tasks may also be required for status checking?
 +
 
 +
This also needs to handle transfers put in backlog before the transfer METS was updated. It should be straightforward to handle backlogged transfers where the pipeline still exists, as the data is all in the database. For backlogged transfers where the pipeline no longer exists, truncated, stub or default data could be done, or require that it be re-run through transfer.
 +
 
 +
== Original requirements ==
 +
 
 +
Added in Archivematica 1.0
 +
 
 +
=== Transfer Backlog Management ===
 
* Related issues: Issue 951, Issue 1220, Issue 1141, Issue 1225, Issue 1257
 
* Related issues: Issue 951, Issue 1220, Issue 1141, Issue 1225, Issue 1257
  
= Requirements for transfer backlog search =
+
=== Requirements for transfer backlog search ===
  
 
* Add ability to search transfer backlog and send one or more transfers to Ingest
 
* Add ability to search transfer backlog and send one or more transfers to Ingest
Line 14: Line 49:
 
* Search the following fields: Any field, transfer name, file name, accession number, PUID, Mimetype, Date - Ingest
 
* Search the following fields: Any field, transfer name, file name, accession number, PUID, Mimetype, Date - Ingest
  
= Mockup of transfer backlog search =
+
=== Mockup of transfer backlog search ===
  
  
Line 21: Line 56:
 
[[File:1.0_TransBacklogSearchResults.png|680px|thumb|center|]]
 
[[File:1.0_TransBacklogSearchResults.png|680px|thumb|center|]]
  
= Transfer Workflow =  
+
=== Transfer Workflow ===
  
 
* Administration - allow MCP access to media or storage where transfer is located
 
* Administration - allow MCP access to media or storage where transfer is located
Line 41: Line 76:
 
[[Media:transferWorkflow0.9.pdf|transferWorkflow0.9.pdf]]
 
[[Media:transferWorkflow0.9.pdf|transferWorkflow0.9.pdf]]
  
= Administration Tab in Dashboard =  
+
=== Administration Tab in Dashboard ===
  
 
* Assign permission and access to the MCPServer to copy from transfer media (hard drives, optical media, USB, etc.) or network location.  
 
* Assign permission and access to the MCPServer to copy from transfer media (hard drives, optical media, USB, etc.) or network location.  
Line 51: Line 86:
 
* Set DIP upload location
 
* Set DIP upload location
  
= Accession metadata =
+
=== Accession metadata ===
  
 
* PREMIS Event = Registration
 
* PREMIS Event = Registration
Line 81: Line 116:
 
* Also see Issue 787 on the Archivematica issues list
 
* Also see Issue 787 on the Archivematica issues list
  
= Microservices Completed Before Move to Backlog =
+
=== Microservices Completed Before Move to Backlog ===
  
 
* All transfer microservices
 
* All transfer microservices
 
* Indexing: See [[Transfer_and_SIP_creation#Transfer_indexing_requirements_0.9_and_beyond]]
 
* Indexing: See [[Transfer_and_SIP_creation#Transfer_indexing_requirements_0.9_and_beyond]]
  
= Handling of Submission Documentation =
+
=== Handling of Submission Documentation ===
  
 
* [http://sites.tufts.edu/dca/about-us/research-initiatives/taper-tufts-accessioning-program-for-electronic-records/| TAPER]?
 
* [http://sites.tufts.edu/dca/about-us/research-initiatives/taper-tufts-accessioning-program-for-electronic-records/| TAPER]?
Line 92: Line 127:
 
* Upload submission documentation with transfer in transfer tab - Issue 1255
 
* Upload submission documentation with transfer in transfer tab - Issue 1255
  
= Search transfers from Archival Storage =
+
=== Search transfers from Archival Storage ===
  
 
New sponsored development planned for Archivematica 1.6 or later will allow users to manage the transfer backlog through the Archival Storage tab, as outlined in general workflows described in these diagrams:
 
New sponsored development planned for Archivematica 1.6 or later will allow users to manage the transfer backlog through the Archival Storage tab, as outlined in general workflows described in these diagrams:

Latest revision as of 16:27, 11 February 2020

Main Page > Development > Development documentation > Transfer backlog requirements

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

Proposed improvements[edit]

Handle cross-pipeline backlog[edit]

March 2017

Summary: Transfers put into backlog should be able to be started as SIP (through the Appraisal tab, SIP arrange, or Backlog tab) on a different pipeline than they were put in backlog from.

Problem: Much of the information needed during Ingest is stored only in the database. If a Transfer is put into backlog and ingest is started on a different pipeline that information is not there.

Proposed fix: The Transfer METS file should be improved to contain all information needed, which could be parsed back into the database at the start of ingest.

Currently there are 4 places where backlogged transfer data lives: pipeline database, Elasticsearch transfers index, storage service database & transfer METS file. This should be consolidated to a single location and the other sources could be rebuilt from the canonical one.

The METS file is treated as the canonical metadata store in the AIP and is a good choice for the canonical metadata store for a transfer in backlog. Since all of this information is needed in ingest is for the AIP METS file, we know all the information can be stored as a METS file. We already parse the AIP METS file into the database on full reingest, so there is precedent & examples for doing this.

Tables needed include:

  • Transfers
  • Files
  • DublinCore
  • RightsStatement & related
  • Events
  • Events_agents (link between Event & Agent)
  • Agents (older version of AM?)
  • FilesIdentifiedIDs (file ID info)
  • FilesIDs (more file ID info)
  • main_fpcommandoutput (characterization)
  • Jobs or Tasks may also be required for status checking?

This also needs to handle transfers put in backlog before the transfer METS was updated. It should be straightforward to handle backlogged transfers where the pipeline still exists, as the data is all in the database. For backlogged transfers where the pipeline no longer exists, truncated, stub or default data could be done, or require that it be re-run through transfer.

Original requirements[edit]

Added in Archivematica 1.0

Transfer Backlog Management[edit]

  • Related issues: Issue 951, Issue 1220, Issue 1141, Issue 1225, Issue 1257

Requirements for transfer backlog search[edit]

  • Add ability to search transfer backlog and send one or more transfers to Ingest
  • Add ability to download and/or view files/transfers (via right click)
  • Search the following fields: Any field, transfer name, file name, accession number, PUID, Mimetype, Date - Ingest

Mockup of transfer backlog search[edit]

1.0 TransferBacklogSearch.png
1.0 TransBacklogSearchResults.png

Transfer Workflow[edit]

  • Administration - allow MCP access to media or storage where transfer is located
  • Assign accession number to transfer
  • Remove transfer backup from workflow - no long processing configuration option
  • Add Send transfer to backlog microservice
  • Add Search transfer backlog tab from Ingest in Dashboard
  • Add ability to download and/or view transfers and files from Search tab
  • Add ability to send transfers from backlog search to Ingest/Create SIP (checkboxes, send button)
  • see workflow diagrams below

0.9 Transfer workflow

  • grey steps are automated, white are manual
TransferWorkflow0.9.png
TransferWorkflow0.9pt2.png

transferWorkflow0.9.pdf

Administration Tab in Dashboard[edit]

  • Assign permission and access to the MCPServer to copy from transfer media (hard drives, optical media, USB, etc.) or network location.
  • Assign transfer backlog locations (configuration is done outside of AM)
  • Assign source directories
  • Define transfer types
  • Assign report locations (post-1.0)
  • Set AIP storage location
  • Set DIP upload location

Accession metadata[edit]

  • PREMIS Event = Registration
        <event>
           <eventIdentifier>
             <eventIdentifierType>UUID</eventIdentifierType>
             <eventIdentifierValue>35cbe00d-d661-4174-b11a-e203f5608008</eventIdentifierValue>
           </eventIdentifier>
           <eventType>registration</eventType>
           <eventDateTime>2012-03-14</eventDateTime>
           <eventDetail></eventDetail>
           <eventOutcomeInformation>
             <eventOutcome></eventOutcome>
             <eventOutcomeDetail>
               <eventOutcomeDetailNote>accession#2012-029</eventOutcomeDetailNote>
             </eventOutcomeDetail>
           </eventOutcomeInformation>
           <linkingAgentIdentifier>
             <linkingAgentIdentifierType>archivist</linkingAgentIdentifierType>
             <linkingAgentIdentifierValue>Courtney Mumma</linkingAgentIdentifierValue>
           </linkingAgentIdentifier>
         </event>
  • Manually input metadata in template on dashboard (See File_Browser_Requirements) : accession number
  • Agent is the archivist logged in at the time doing the accession (post-1.0, for 1.0 this will still be repository)
  • Event name is "registration" (to be added to PREMIS events master list should we decide to implement)
  • UUID
  • Also see Issue 787 on the Archivematica issues list

Microservices Completed Before Move to Backlog[edit]

Handling of Submission Documentation[edit]

  • TAPER?
  • Normalized with objects in AIP (0.8)
  • Upload submission documentation with transfer in transfer tab - Issue 1255

Search transfers from Archival Storage[edit]

New sponsored development planned for Archivematica 1.6 or later will allow users to manage the transfer backlog through the Archival Storage tab, as outlined in general workflows described in these diagrams:

Transfer management workflows.png

As outlined above, users will be able to:

  • Search transfers from archival storage tab
  • Download copies of transfers or selected files from archival storage tab
  • Be able to perform transfer deletion requests from archival storage tab

Transfer search user stories

As an archivist, I need to find transfers by searching...

  • by the name of the transfer
  • by the date the transfer was stored in backlog
  • by names of files within the transfer
  • by....?

Mockups: Version 1

Search transfers from Archival Storage:

Archival Storage Transfer Search.png

Notes:

  • Click on "Show transfers" to search AIPs as well as Transfers
  • A new column in the table indicates whether a package is a Transfer or an AIP.
  • To trigger transfer deletion, click on red "remove" icon (same functionality as AIP deletion)

Search files from transfers in Archival Storage:

Archival Storage Transfer file search.png

Notes:

  • Clicking on both "Show files" and "Show transfers" before searching will load preview of files from transfer backlog.
  • The UUID of the package and an indication of whether the file is from a Transfer or an AIP is in the right column.

Mockups: Version 2

In this version, toggling between searching for AIPs/Transfers is done through a tab at the top. This makes the development significantly less complicated, as we would not need to combine the Elasticsearch indexes for transfer METS and AIP METS.

Transfer search:

Transfer search v2.png

Files within transfer search:

Transfer search files v2.png