Difference between revisions of "UM transfer 1.0"
(5 intermediate revisions by one other user not shown) | |||
Line 6: | Line 6: | ||
In Archivematica, Transfer is the process of transforming any set of digital objects and/or directories into a SIP. Transformation may include appraisal, arrangement, description and identification of donor restricted, private or confidential contents. | In Archivematica, Transfer is the process of transforming any set of digital objects and/or directories into a SIP. Transformation may include appraisal, arrangement, description and identification of donor restricted, private or confidential contents. | ||
− | In the Transfer tab of the Dashboard, the user moves digital objects from source directories accessible via the Storage Service into Archivematica. See [[ | + | In the Transfer tab of the Dashboard, the user moves digital objects from source directories accessible via the Storage Service into Archivematica. See [[Administrator_manual_1.0#Storage_service|Administrator manual - Storage Service]] for instructions on how to set up shared transfer source directories. Once uploaded to the dashboard, transfers run through several micro-services: UUID assignment; checksum verification (if checksums are present); package extraction (i.e. unzipping of zipped or otherwise packaged files); virus checking; indexing; format identification and validation; and metadata extraction. </br> |
At the end of transfer, the user creates a SIP from one or more standard transfer(s). Once this is done, the SIP is moved into ingest. | At the end of transfer, the user creates a SIP from one or more standard transfer(s). Once this is done, the SIP is moved into ingest. | ||
Line 18: | Line 18: | ||
If your transfer is composed of objects that are the result of digitization, please see [[UM digitization output .10|Digitization output]]. | If your transfer is composed of objects that are the result of digitization, please see [[UM digitization output .10|Digitization output]]. | ||
− | If you would like to skip some of the default decision points or make preconfigured choices for your desired workflow, see [[ | + | If you would like to skip some of the default decision points or make preconfigured choices for your desired workflow, see [[UM_administration_1.0#Processing_configuration|User manual - User administration - Processing configuration]] |
</br> | </br> | ||
Line 38: | Line 38: | ||
Another option is to create a transfer in a structured directory prior to beginning processing in Archivematica. The structured directory in Archivematica is the basic configuration of the transfer. If you just add a directory to the dashboard and start transfer processing, Archivematica will restructure it so it complies with this structure. There should be three subdirectories: logs, metadata, objects. The objects directory contains the digital objects that are to be preserved. The metadata directory contains the checksum, the METS file, and a submissionDocumentation subfolder, which can be used for transfer forms, donation agreements or any other documents that relate to the acquisition of the records. The logs folder will eventually contain logs generated when processing the transfer in Archivematica. You can create subdirectories within objects if desired. | Another option is to create a transfer in a structured directory prior to beginning processing in Archivematica. The structured directory in Archivematica is the basic configuration of the transfer. If you just add a directory to the dashboard and start transfer processing, Archivematica will restructure it so it complies with this structure. There should be three subdirectories: logs, metadata, objects. The objects directory contains the digital objects that are to be preserved. The metadata directory contains the checksum, the METS file, and a submissionDocumentation subfolder, which can be used for transfer forms, donation agreements or any other documents that relate to the acquisition of the records. The logs folder will eventually contain logs generated when processing the transfer in Archivematica. You can create subdirectories within objects if desired. | ||
+ | |||
+ | * Please do not include submission documentation that has non-standard characters in the filename, as submission documentation names are not sanitized. Any filenames other than plain ASCII names may cause errors in processing. | ||
#Open the file browser by clicking on the Home folder on the Archivematica desktop | #Open the file browser by clicking on the Home folder on the Archivematica desktop | ||
Line 50: | Line 52: | ||
#Click on the micro-service to display jobs that have completed, including the one requiring action. | #Click on the micro-service to display jobs that have completed, including the one requiring action. | ||
#In the Actions drop-down menu, select "Approve transfer" to begin processing the transfer ('''figure 3'''). You may also "Reject transfer" and quit processing. | #In the Actions drop-down menu, select "Approve transfer" to begin processing the transfer ('''figure 3'''). You may also "Reject transfer" and quit processing. | ||
− | #The transfer will now run through a series of micro-services. These include: | + | #The transfer will now run through a series of [[Archivematica_1.0_Micro-services|micro-services]]. These include: |
#*Verify transfer compliance (verifies that the transfer is properly structured - i.e. with the logs, metadata and objects folders) | #*Verify transfer compliance (verifies that the transfer is properly structured - i.e. with the logs, metadata and objects folders) | ||
− | #* | + | #*Rename with transfer UUID (assigns a unique universal identifier for the transfer as a whole; directly associates the transfer with its metadata) |
− | #*Assign file UUIDs to objects (assigns a unique universal identifier to each file in the /objects directory) | + | #*Assign file UUIDs and checksums to objects (assigns a unique universal identifier and sha-256 checksum to each file in the /objects directory) |
− | #*Verify | + | #*Verify transfer checksums (verifies any checksums included with the transfer in its metadata directory) |
− | + | #*Generate METS.xml document (creates a METS file capturing the original order of the transfer. This METS file is added to any SIPs generated from this transfer) | |
− | #*Generate METS.xml document (creates a METS file capturing the original order of the transfer. This METS file is added to any SIPs generated from this transfer) | + | #*Quarantine (quarantines the transfer to a set duration based on preconfiguration settings in the Administration tab of the dashboard; to allow virus definitions to update before a virus scan) |
− | #* | ||
− | |||
#*Scan for viruses (scans for viruses and malware) | #*Scan for viruses (scans for viruses and malware) | ||
+ | #*Clean up file and directory names (removes prohibited characters from folder and filenames, such as ampersands) | ||
#*Identify file format (this is the identification that normalization will be based upon, the user can choose between FIDO and extension or skipping format identification at this stage) See [[UM_transfer_1.0#Format_identification|Format identification]] below for more information. | #*Identify file format (this is the identification that normalization will be based upon, the user can choose between FIDO and extension or skipping format identification at this stage) See [[UM_transfer_1.0#Format_identification|Format identification]] below for more information. | ||
+ | #*Extract packages (extracts contents from zipped or otherwise packaged files) | ||
#*Characterize and extract metadata (identifies and validates file formats; extracts technical metadata embedded in the files). If you have preconfigured it to do so, Archivematica will stop during this micro-service and allow the user to choose a file identification command from a dropdown menu. To learn about preconfigured options, please see [[Administrator_manual_1.0#Processing_configuration|Administrator manual 1.0 - Processing configuration]]. Archivematica's file identification default is set to identification by file extension. You can also choose to skip identification and run it later, during Ingest, instead. | #*Characterize and extract metadata (identifies and validates file formats; extracts technical metadata embedded in the files). If you have preconfigured it to do so, Archivematica will stop during this micro-service and allow the user to choose a file identification command from a dropdown menu. To learn about preconfigured options, please see [[Administrator_manual_1.0#Processing_configuration|Administrator manual 1.0 - Processing configuration]]. Archivematica's file identification default is set to identification by file extension. You can also choose to skip identification and run it later, during Ingest, instead. | ||
− | #* | + | #* Complete transfer (Includes indexing transfer) |
#A transfer that is in the middle of processing will show which micro-services have been completed (green) and which are in progress (orange). | #A transfer that is in the middle of processing will show which micro-services have been completed (green) and which are in progress (orange). | ||
#When a micro-service fails or encounters an error, the micro-service background turns from green to pink and a "failed" icon appears next to the transfer or SIP name. See [[UM error handling|Error handling]] for more information about how to handle an error. | #When a micro-service fails or encounters an error, the micro-service background turns from green to pink and a "failed" icon appears next to the transfer or SIP name. See [[UM error handling|Error handling]] for more information about how to handle an error. | ||
Line 71: | Line 73: | ||
#See [[UM ingest|Ingest]] for next steps. | #See [[UM ingest|Ingest]] for next steps. | ||
[[Image:CreateSIP1.png|600px|right|thumb|'''Figure 4''' A transfer that is ready to be packaged into a SIP or stored in backlog]] | [[Image:CreateSIP1.png|600px|right|thumb|'''Figure 4''' A transfer that is ready to be packaged into a SIP or stored in backlog]] | ||
+ | |||
+ | </br> | ||
== Format identification == | == Format identification == | ||
+ | |||
+ | Archivematica's default is to allow the user to choose identification options to base normalization actions upon during transfer and then use those results to base normalization upon in ingest. However, you can set your [[UM_administration_1.0#Processing_configuration|preconfiguration options]] to allow for the opposite (skip at transfer and identify before normalization) or for both transfer and ingest to allow for user choice in the dashboard. | ||
+ | |||
+ | Artefactual included the ability to skip identification at transfer and/or to change identification tool before normalization mainly to allow for the possibility that content in the transfer backlog may contain formats for which there are not currently entries in the FPR. While the transfers are in the backlog, you can add rules that allow for the format(s) not identified or identifiable at time of transfer to the FPR so that, when they are processed through ingest, all formats will be identified and normalization attempted based on those identifications. | ||
+ | |||
+ | There may be other use case scenarios in the future that this configuration flexibility facilitates. In general, we aim to include as much flexibility as possible when it comes to workflow choices so that the archivist is as central as possible to AIP and DIP processing rather than hardcoding and automating so much that the archivist is left less influence on ingest. | ||
+ | |||
+ | Format identification is logged as a PREMIS event in the METS.xml using the results of running FITS tools (DROID, in particular), not the results of the tool selected to base normalization upon. This will change in coming releases. |
Latest revision as of 12:48, 26 February 2014
Main Page > Documentation > User manual > User manual 1.0 > Transfer
General description[edit]
In Archivematica, Transfer is the process of transforming any set of digital objects and/or directories into a SIP. Transformation may include appraisal, arrangement, description and identification of donor restricted, private or confidential contents.
In the Transfer tab of the Dashboard, the user moves digital objects from source directories accessible via the Storage Service into Archivematica. See Administrator manual - Storage Service for instructions on how to set up shared transfer source directories. Once uploaded to the dashboard, transfers run through several micro-services: UUID assignment; checksum verification (if checksums are present); package extraction (i.e. unzipping of zipped or otherwise packaged files); virus checking; indexing; format identification and validation; and metadata extraction.
At the end of transfer, the user creates a SIP from one or more standard transfer(s). Once this is done, the SIP is moved into ingest.
If you would like to import lower-level metadata with your transfer (i.e. metadata to be attached to subdirectories and files within a SIP), see Metadata import
If your transfer is a DSpace export, please see DSpace export.
If your transfer is a bag or a zipped bag, please see Bags
If your transfer is composed of objects that are the result of digitization, please see Digitization output.
If you would like to skip some of the default decision points or make preconfigured choices for your desired workflow, see User manual - User administration - Processing configuration
Create a transfer[edit]
Open the web browser and the Archivematica dashboard to sign in with your username and password. Please note that if this is your first time logging in to a newly installed instance of Archivematica 1.0, you will see a log-in that allows you to register your repository and get updates to the Format Policy Registry (FPR)
- In transfer tab, select your transfer type in the dropdown menu. Types include standard, unzipped bag, zipped bag, dSpace and maildir.
- In transfer tab, name your transfer and browse to a source directory to select your object or set of objects for upload. Your transfer can be composed of multiple directories from different sources. Repeat this step if your transfer is composed from multiple sources. (figure 1)
- If applicable, enter an accession number for the transfer.
- Once all of your digital object sources have been uploaded, hit the Start Transfer button for the transfer processing to begin. (figure 2)
Create a transfer with submission documentation[edit]
Another option is to create a transfer in a structured directory prior to beginning processing in Archivematica. The structured directory in Archivematica is the basic configuration of the transfer. If you just add a directory to the dashboard and start transfer processing, Archivematica will restructure it so it complies with this structure. There should be three subdirectories: logs, metadata, objects. The objects directory contains the digital objects that are to be preserved. The metadata directory contains the checksum, the METS file, and a submissionDocumentation subfolder, which can be used for transfer forms, donation agreements or any other documents that relate to the acquisition of the records. The logs folder will eventually contain logs generated when processing the transfer in Archivematica. You can create subdirectories within objects if desired.
- Please do not include submission documentation that has non-standard characters in the filename, as submission documentation names are not sanitized. Any filenames other than plain ASCII names may cause errors in processing.
- Open the file browser by clicking on the Home folder on the Archivematica desktop
- The structured directory should contain three subdirectories: logs, metadata, objects. Copy the digital files to be preserved into the objects directory. Note that you can create subdirectories within objects.
- Add submission documentation. In the transfer you have just created, navigate to the /metadata/ folder and add a /submissionDocumentation directory. Add files to that folder like donor agreements, transfer forms, copyright agreements and any correspondence or other documentation relating to the transfer. Any SIPs subsequently made from this transfer will automatically contain copies of this documentation.
Process the transfer[edit]
- In the dashboard transfer tab, the transfer will appear in the dashboard with a bell icon next to it. This means that it is awaiting a decision by the user. (figure 3)
- Click on the micro-service to display jobs that have completed, including the one requiring action.
- In the Actions drop-down menu, select "Approve transfer" to begin processing the transfer (figure 3). You may also "Reject transfer" and quit processing.
- The transfer will now run through a series of micro-services. These include:
- Verify transfer compliance (verifies that the transfer is properly structured - i.e. with the logs, metadata and objects folders)
- Rename with transfer UUID (assigns a unique universal identifier for the transfer as a whole; directly associates the transfer with its metadata)
- Assign file UUIDs and checksums to objects (assigns a unique universal identifier and sha-256 checksum to each file in the /objects directory)
- Verify transfer checksums (verifies any checksums included with the transfer in its metadata directory)
- Generate METS.xml document (creates a METS file capturing the original order of the transfer. This METS file is added to any SIPs generated from this transfer)
- Quarantine (quarantines the transfer to a set duration based on preconfiguration settings in the Administration tab of the dashboard; to allow virus definitions to update before a virus scan)
- Scan for viruses (scans for viruses and malware)
- Clean up file and directory names (removes prohibited characters from folder and filenames, such as ampersands)
- Identify file format (this is the identification that normalization will be based upon, the user can choose between FIDO and extension or skipping format identification at this stage) See Format identification below for more information.
- Extract packages (extracts contents from zipped or otherwise packaged files)
- Characterize and extract metadata (identifies and validates file formats; extracts technical metadata embedded in the files). If you have preconfigured it to do so, Archivematica will stop during this micro-service and allow the user to choose a file identification command from a dropdown menu. To learn about preconfigured options, please see Administrator manual 1.0 - Processing configuration. Archivematica's file identification default is set to identification by file extension. You can also choose to skip identification and run it later, during Ingest, instead.
- Complete transfer (Includes indexing transfer)
- A transfer that is in the middle of processing will show which micro-services have been completed (green) and which are in progress (orange).
- When a micro-service fails or encounters an error, the micro-service background turns from green to pink and a "failed" icon appears next to the transfer or SIP name. See Error handling for more information about how to handle an error.
- Once the transfer micro-services are completed, a bell icon will appear next to the transfer. This means that the transfer is ready to be packaged into a SIP for ingest or sent to a backlog, indexed and stored to be retrieved for processing at a later date (figure 4).
- Option 1: Select "Create single SIP and continue processing". (Note that Create SIP(s) manually is not currently functional. However, the ability to create one or more SIP(s) from one or more transfer(s) will return in Archivematica 1.1 with improved functionality in the web browser. See [File_Browser_Requirements|File browser requirements - Create SIP] and [Transfer_and_SIP_creation|Transfer and SIP creation workflows)
- Option 2: Select "Send transfer to backlog". In this case, your transfer will be stored in a backlog in the same location as your AIP store so that you can retrieve one or more transfers from the Ingest tab for processing at a later date. See Managing a backlog
- Option 3: Select "Reject the transfer".
- See Ingest for next steps.
Format identification[edit]
Archivematica's default is to allow the user to choose identification options to base normalization actions upon during transfer and then use those results to base normalization upon in ingest. However, you can set your preconfiguration options to allow for the opposite (skip at transfer and identify before normalization) or for both transfer and ingest to allow for user choice in the dashboard.
Artefactual included the ability to skip identification at transfer and/or to change identification tool before normalization mainly to allow for the possibility that content in the transfer backlog may contain formats for which there are not currently entries in the FPR. While the transfers are in the backlog, you can add rules that allow for the format(s) not identified or identifiable at time of transfer to the FPR so that, when they are processed through ingest, all formats will be identified and normalization attempted based on those identifications.
There may be other use case scenarios in the future that this configuration flexibility facilitates. In general, we aim to include as much flexibility as possible when it comes to workflow choices so that the archivist is as central as possible to AIP and DIP processing rather than hardcoding and automating so much that the archivist is left less influence on ingest.
Format identification is logged as a PREMIS event in the METS.xml using the results of running FITS tools (DROID, in particular), not the results of the tool selected to base normalization upon. This will change in coming releases.