Moving data between university HPCs

3 minute read

Published:

Here I outline a process for moving large amounts of data between two uni HPCs.

Recently I moved to a new university and found myself banging my head against the problem of transferring large files from my old uni’s High Performance Computing (HPC) system (‘SPARTAN’) and my new uni’s HPC (‘Lyra’). I found this somewhat tricky because you are dealing with two remote hosts that can’t directly communicate with one another. I wanted to come up with a solution that avoided the need for manually downloading everything from the old HPC onto a local computer before re-uploading to the new HPC.

Below is the process that worked for me. Note, this is mac specific (OS Catalina 10.15.7)

  1. Install macFuse and sshfs from here.

  2. Go to System Preferences -> Users & Groups -> Click the ‘+’ sign at the bottom left of the window -> Set the New Account dropdown to ‘Group’ -> Full Name = ‘fuse’ -> Click on group and user check box to mark yourself as a member

  3. Make a folder where you will ‘mount’ the old HPC:

     mkdir ~/sshfs_mount
    
  4. Mount the relevant folder from your old HPC (enter old HPC password when prompted)

     sshfs -o idmap=user OLD_USERNAME@OLD-HPC-ADDRESS://OLD_PATH-TO-FOLDER ~/sshfs_mount
    
  5. Navigate to the mounted folder on your local computer (you don’t need to ssh into the old HPC).

     cd ~/sshfs_mount 
    
  6. Use rsync to transfer files from mounted folder to new HPC (in the example below I am transferring all files ending in ‘.bdf’ from the old HPC to the new HPC).

     rsync -rv --ignore-existing --progress *.bdf NEW_USERNAME@NEW_HPC_ADDRESS:NEW_PATH_TO_FOLDER
    

Troubleshooting

To get macFUSE to work I had to go to: Security & Privacy System Preferences -> General preferences, and allow loading system software from developer “Benjamin Fleischer”.

When I first tried mounting the remote folder I got this error message: 'remote host has disconnected'… this was because I’d mis-typed my old HPC address 🤦

Then when I tried remounting the remote I got this message: 'mount_macfuse: mount point /Users/willturner/sshfs_mount is itself on a macFUSE volume'

To fix that I simply had to run: diskutil umount force ~/sshfs_mount

If you’ve stumbled across this, I hope it was helpful. Please feel free to reach out and let me know if you’ve come across a better method for solving this task.

UPDATE (3/8/23)

The above method, whilst technically functional, is in practise agonisingly slow (with transfer speeds of Kb/s 🤦). To speed things up this is what I’ve been doing (if you have better ideas please let me know!):

1) I use a virtual machine (provided by my uni) as the intermediate ‘bridge’ between the two remotes, instead of my laptop.

2) On that computer (a windows machine) I mounted my old HPC (following these instructions).

3) Then I used cyberduck, connected to the new HPC, to upload the files off the mounted folder on the virtual machine. This massively increased the upload speed (although it does still take ages to transfer 250Gb of data… but luckily that can all chug away on the virtual machine in the background!).