Connecting a Private GitHub Repository with Google Colab via Deploy Key
Whether you have a Google Colab project that is steadily growing bigger and bigger or you have a finished machine learning project you want to run in Google Colab. At some point you probably want to be able to include a GitHub repository into your Google Colab project.
There are quite some tutorials out there explaining how to do this with public GitHub repositories, but they do not work with private repositories. The solution I am presenting here has multiple advantages
- You don’t need to connect your notebook with your Google Drive
- You don’t need to enter (or for God’s sake even save) passwords in your notebook
- You can easily use it with multiple collaborators and everyone can start a session at any time
- Once you set it up, cloning works fast and requires no user interaction
The key idea is that we create a GitHub deploy key and embed it in the notebook.
Warning: This method will give everyone who is able to get a look at your notebook pull access (or even push access depending on your configuration) to the private GitHub repo!
You can however revoke this at any time by deleting the deploy key. I think we can all agree that Google Colab should get a proper Git integration, but until then, this is the best solution I know.
Creating the Deploy Key
A GitHub deploy key is just a normal asymmetric key. To create one, run
! ssh-keygen -t rsa -b 4096
in your notebook. Confirm the default filename (id_rsa
) and leave the passphrase fields empty (You may need to focus the output part of the cell to be able to do so).
You can access the generated public key by running
! cat /root/.ssh/id_rsa.pub
Now go to GitHub and open the private repository you want to be able to use in your Google Colab notebook. Go to “Settings > Deploy keys” and click “Add deploy key”. Name your key and copy & paste the content of id_rsa.pub
into the key field.
You need to decide if you want the deploy key to have write access. If you enable this, you can push changes from Google Colab back to the GitHub repository. However, this will also give everyone who had a look at your notebook push access to the GitHub repository.
You will need the private key for the next section. In order to show it, run
! cat /root/.ssh/id_rsa
Note: You can of course create the SSH key on your own computer. Just make absolutely sure you don’t use a relevant key for this, i.e. a key that you use for anything else.
Creating the Setup Cell
Now the only problem left is that the SSH key will disappear when the Google Colab session ends. To avoid this, we create a cell that adds the private key back to the session upon execution.
Whenever you start a new session in your notebook, just run the following cell. This will add the private key and clone the repository. It won’t need any user interaction.
Replace <private key>
, <user name>
and <repo name>
by your actual values.
! mkdir -p /root/.sshwith open("/root/.ssh/id_rsa", mode="w") as fp: fp.write("""<private key>""")! ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts! chmod go-rwx /root/.ssh/id_rsa! git clone git@github.com:<user name>/<repo name>.git%cd /content/<repo name>
What is happening here?
- First create the
.ssh
directory to store the private key in - Write the private key data to disk
- Add
github.com
to the list of known hosts - Restrict access to
id_rsa
, otherwisegit clone
will complain and abort - Clone the repository
- Change the working directory of the notebook to the base directory of the repository (Note the
%
. This is not a shell command, but an IPython builtin)
You can now delete all cells you created in the “Creating the Deploy Key” section.
Tip: If there are changes in your repository during an active session, just add a cell with ! git pull
to update.