Posts by Category

Workflow

Working on a Remote Server Effectively and Efficiently

With the large increase of people working from home and with the general popularity of doing research using computationally heavy machine learning algorithms, working on remote servers has become significantly more popular. Therefore, setting up your system for remote work the right way can save you a great deal of time. Because of this I decided to write down my system for working on a remote server, in the hope that some people might find this useful.

To give you a little more background about how I work, I will tell you about my typical workflow. Currently, I am mostly using python for my development work and I write, test and run my code on a remote server which needs to be accessed through a firewall. I develop code in Jupyter Lab and migrate finalised pieces of code to some separate python files which form the core of my software package and are sorted based on their functionality in the data analysis process (and of course I use Git for version control). So for me, the most important thing I need to do on a remote server are: running scripts on the server through an SSH-tunnel, editing files on the remote server, and running a Jupyter Lab session on the server. This functionality is what my setup is built around.

Setting up your SSH configuration

Firstly, while you can access your remote machines easily using the command line quite easily, a lot of time can be saved by correctly setting up a SSH configuration file. In doing this, you can set up predetermined settings for each of your remote hosts and make your SSH commands much simpler. This SSH configuration file exists (or should be created) as the ~/.ssh/config file on your machine. A typical configuration file will have entries for each of the remote hosts that look like this:

Host <remote-host-nickname>
  User <remote-username>
  HostName <remote-hostname>

Here, we have defined an alias for our remote host at the address <remote-hostname> where you would log in under the username <remote-username>. To expand the functionality of our SSH tunnels we will use two commands.

The ProxyJump command makes it easy to hop to your remote host via a firewall or gateway server at the address <gateway-hostname>. We can specify another host address to jump off from. This will also forward all the configuration setups through this extra host to our destination. We will also set up an alias in our configuration file for this remote host to make things easier.

The LocalForward command forwards the local ports from one host to the other and can therfore be used to redirect web servers to our local machine. We will forward a port from the remote host (<remote-port>) to a port on your local machine (<local-port>). I typically choose these ports to be the same number for local and remote but you might want to change it to be different. The <remote-port> is going to be the port that a web server (e.g. Jupyter Lab) is running on. Since the server is often shared between several people and the ports can only be used once, you will have to pick one that is not in use by anyone else. Some of the ports on every machine are reserved for various functions, but if you choose a number between 8000 and 9000 you should be fine.

Combining these two commands we can configure the ssh tunnel in the ~/.ssh/config file:

### gateway
Host <gateway-host-nickname>
  User <gateway-username>
  HostName <gateway-hostname>

### endpoint
Host <remote-host-nickname>
  User <remote-username>
  HostName <remote-hostname>
  ProxyJump <gateway-host-nickname>
  LocalForward <local-port> localhost:<remote-port>

Here, we have defined a remote host that connects via a firewall or gateway server and that also forwards a remote port to our local machine.

After this you should be able to ssh into the remote host by using:

ssh <remote-host-nickname>

Setting up SSH keys

When logging into your remote machine, you will be prompted to fill out your password and this can become tedious, especially when jumping through a firewall. This typically causes people to pick easy to remember and short passwords which are generally unsafe. A better way of approaching this is by using SSH keys. In doing this we can authorise on the remote machine using an encrypted key pair. Unfortunately, not all servers allow this type of authentication so this might not work for your particular configuration. To generate a key pair using the standard settings, we can use the following command:

ssh-keygen

It will give you a prompt which looks like this:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/id_rsa.
Your public key has been saved in /home/user/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:aDjkhs6wZsouL7XKPKXWhzzsgpEzsAqwi6z+BZ4n93Y user@remote-host
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|    .            |
|o  + . .         |
|++..= o S        |
|O=oooo           |
|BOO=.+           |
|#*.*=... E       |
|XX*oo ...        |
+----[SHA256]-----+

This way you can set up an encrypted key pair, which you can use to authorise your access on a remote machine. I recommend using a good password to protect the SSH key, which will still be easier to remember than all the different passwords for all of the different remote hosts.

You can then copy your files to the machines by using:

ssh-copy-id -i ~/.ssh/id_rsa <gateway-host-nickname>
ssh-copy-id -i ~/.ssh/id_rsa <remote-host-nickname>

After this, you should be able to SSH into the server without having to fill in your password.

Accessing and Editing Files on the Server

While there are various ways of syncing files between two different machines (using git to push and pull code, the scp protocol, or even rsync) the method that is the easiest to me is based on the sshfs command which uses the SFTP protocol to mount a network drive to your file system. To set this up we link the folder <remote-host-folder> on the remote host to an existing folder <mountpoint> on our local machine by doing:

sshfs -o ServerAliveInterval=15 <remote-host-nickname>:<remote-host-folder> <mountpoint>

The extra option -o ServerAliveInterval=15 checks if the connection is still active every 15 seconds and automatically disconnects when the connection breaks. You can also unmount the folder manually by doing on Linux:

fusermount -u mountpoint

Or on MacOS:

umount mountpoint

You can now view and/or edit the files on your local machine and any changes made will be automatically synced to your remote machine. (Do note that if your directory has a lot of files it will take a while to index all of them)

Running things on the Remote Server

When running things on the remote server I recommend using a terminal multiplexer like screen or tmux. These apps enable you to open more virtual terminals using one SSH tunnel. Also, the created virtual terminals can persist even after disconnecting the SSH tunnel which is ideal for long running jobs or spotty internet connections. To start using this you simply use:

screen

and it will create a new screen session for you. Commands can be input by first using Ctrl + a and then picking one of the options. Below you can find a few useful commands for operating within a screen session:

c   create new terminal
?   show all options
0   switch to terminal 0
1   switch to terminal  (etc.)
d   detach the session

If you detach the session (or just disconnect the SSH tunnel), you can later reattach the session by using:

screen -r

Using Jupyter Lab on a Remote Server

Now that we have everything in place, it is quite easy to start something like a Jupyter Lab/Notebook session. Using port forwarding (the LocalForward command) on our SSH tunnel we can port the web server to our local machine. To make this work we start a Jupyter Lab server (in a screen session so that it persists) on the same port that we used for the port forwarding (<remote-port>).

jupyter lab --port=<remote-port> --no-browser