System Hardware
Main Page > Projects > Vancouver Digital Archives > Technology/Tools_Evaluation > System Hardware
Transfer Hardware: Private Records[edit]
Transfer Workflow[edit]
Ingest Hardware Requirements and Architecture
Option 1: External hard disks[edit]
External hard disks offer a cheap, scalable, portable and flexible way to transfer and process digital objects
These articles on Tom's Hardware are very useful for our analysis of off-the-shelf hard-disks: [1] [2] [3]
See also these articles on external disk enclosures: [4] [5]
Multi-disk enclosures: [6]
Data transfer rates[edit]
- Speed will be an issue to reduce the amount of total time required on-site.
- USB 2.0 has a max rate of 60MB/s; typically the max attainable rate is ~40MB/s though, with 20-30 being more reasonable. That works out to ~140GB/hr, or 6.95 hours for one TB.
- to fill a 1TB drive ~12h or more
- Firewire - actual throughput rates seem to depend on both the number and size of files being transferred. Any where from 20-40% better, depending on circumstances.
- to fill a 1TB drive ~5h
- eSATA has been clocked at a max of 130MB/s in some posted tests. See [7]
- USB 2.0 has a max rate of 60MB/s; typically the max attainable rate is ~40MB/s though, with 20-30 being more reasonable. That works out to ~140GB/hr, or 6.95 hours for one TB.
- eSata will probably be faster but likely require a card to be added to the motherboards of the source computers. So external hard disk transfer will most likely happen via USB
Option2: Connect via Ethernet switch[edit]
- connect a transfer PC to the source network's Ethernet switch
- the transfer PC has multiple external hard disks connected to it via eSata ports on its Motherboard
- assuming the source network uses standard 100Mb Ethernet (ideally it has 1Gb but not as likely) transfer should still be much faster and more centralized (all going through one transfer PC) than multiple USB connected hard disks.
Data transfer tool[edit]
- Ideally, it is able to carry on copying while flagging files with issues to avoid stopping the process on a file-by-file basis.
- Rsync
- alternative: Unison
- Requires getting admin privileges on source machines (so that we can install copy tool)
Transfer testing[edit]
After testing in early January 2010, the DA team concluded that we would need to format and partition external drives for the system that we are transferring from. Roughly half of our external transfer drives will be formatted for Mac OSX transfers and the rest for Windows PCs. The following instructions, by Austin, illustrate our preferred methods:
- formatting external drives with gparted (hfs+/ntfs):
The same round of testing resulted in the following instructions, by Austin, for copying directories from a Windows machine. Mac instructions are forthcoming.
- recursive md5sum integrity check, rsync-cygwin windows, and md5sum audit with ubuntu:
- IMPORTANT: see initial test results: Mac Darwinports/hashdeep test
Data transfer tests[edit]
21-12-09[edit]
- using cwRsync
- copied full NTFS formatted C: drive of Artefactual office Windows PC via USB2 to a root directory on a FAT32 formatted, 1 TB, 3.5" external hard disk using a NexStar3 hard disk enclosure
- Command used:
rsync.exe -avc "/cygdrive/c/" "/cygdrive/e/"
- did not run using log option. Command line output at end of process:
sent 31798118359 bytes received 2154449 bytes 911117.34 bytes/sec total size is 33256040474 speedup is 1.05 rsync error: some files/attrs were not transferred (see previous erors) (code 23) at main.c(139) [sender=3.0.6]
- Windows Explorer properties info:
- E:\windowsbackup (external hard disk copy)
Size: 23.6 GB (25,394,095,429 bytes) Size on disk: 23.8 GB (25,630,216,192 bytes) Contains: 98,365 Files, 28,626 Folders
- C:\ (source system)
Used space: 30,927,818,752 bytes 28.8 GB
- Issues:
- need to turn on logging
- Windows command prompt terminal doesn't allow for scrolling all the way back through all the output (to locate errors)
- why does Rsync output say only 2154449 bytes received but total size is 33256040474?
- difficult to compare source & target using Windows Explorer because it does not give the same properties info for a Disk (C:\) and a directory (E:\windowsbackup)
- not sure how long it took. Began processing when leaving office at 5pm, Dec 21. Was finished when back at 9am on Dec 22.
- reformatted the external disk right after test so did not get chance to try and run the same command again.
- rsync will not move any files twice. However it will run the checksums and it may fix the errors recieved in the first transfer. Need to try a second pass if same issue in next test
- need to turn on logging
Next Test[edit]
- with the next test we should try and drop the -a flag.
- the -a flag includes the following options recurse into subdirectories, copy symlinks as symlinks, retain file permissions, retain file time stamps, retain group ownership, retain owner, and preserve devices.
- Some of these options may be catered to the linux system and may of caused a issue while copying.
- if you drop the -a flag the -r flag has to be used for recursion
- The following command could be used for the next test:
rsync.exe -rvci --log-file="/cygdisk/e/backup.log" "/cygdisk/c/Documents and Settings" "/cygdisk/e/doucmentsbackup"
- If there are errors received after the backup, the same command should be run. Rsync will then spider the drive and compare checksums of the backup and will attempt to update and files that receive error.
- It should be noted that when backing up the entire C:\ drive this includes all files used in the windows OS. Including temporary files and swap/page files. Because of this it is unlikely that that rsync will complete with out some inconsistencies. However if we just need to backup the users home directories this should not be a problem.
- ...actually, we need to simulate the copying of 1/3 of top-level directories for a given Drive.
- Does this mean we repeat the command line for each top-level directory?