Use git submodules to install a private, custom python package in a docker image
This is a complex title but I swear it’s not difficult
Wow that title contains a lot of terms! Simply said this article offers you the best way to share your Python code privately and easily, even run it in a docker container! In the end you’ll be able to distribute private packages that run everywhere, are easily maintainable and updateable. Let’s code!
Preparation
You’ve created a Python script with all kinds of handy functions that you want to share with others. You have two choices on how to approach this:
- Package your code and distribute it publicly on PyPi as detailed in this article. This means that anyone is able to install the package by just calling
pip install yourpackage
just like pandas e.g. - Package your code and distribute it privately as detailed in this article. This means that only certain people are able to install the package, like coworkers in your company.
We are going for the second option, making sure that our code stays private. We’re going to build upon the article in the second option so make sure to read it first. After reading it you’ve got a Python package that you can pip install from your git repository.
The problem
If you’ve followed the instructions in this article you have a package in your git repo that you can install like pip install git+https://github.com/mike-huls/toolbox.git
. In this example we’ll imagine we’re creating a Python project that uses the toolbox package from the article.
We would like to run our project in a docker container. To build the Docker image we have to first copy our source code into the image and then install all of the packages that we use. This is done by providing a list of all packages that we use. You can generate this list using pip freeze > requirements.txt
. The generated requirements.txt will contain the name and version of all of our packages.
The problem is that even our privately installed package is listed by name and version. This causes pip to search for toolbox on PyPi and it cannot find the package there since we’ve installed it privately from our Git repo. We also cannot provide our git+[GIT_REPO_URL]
since our Docker image has no credentials to log into git. There is a way to solve this problem using ssh keys but in my opinion this is very hard to manage; there’s a much simpler option that’ll allow your coworkers to get the code much easier without the hassle of generating or distributing ssh keys.
The solution
We’re going to use git submodules to pull the package into our project directory, install it and then modify our Dockerfile to install the package into our Docker image. Here’s how:
1 Add the submodule
Go to your project root and:
git submodule add https://github.com/mike-huls/toolbox _submodules/toolbox
(for illustrative purposes I’m using a public repository here)
This will create a folder called _submodules in your root that contains another folder with your Python package.
2 Install the package
Simply pip install the package by executing: pip install _submodules/toolbox
3 Create the Dockerfile
Once we want to create our Docker image it’s a simple matter of repeating the two previous steps; copy the submodule folder into the image and then running install again.FROM python:3.8
WORKDIR /app
# Install regular packages
COPY requirements.txt .
RUN pip install -r requirements.txt
# Install submodule packages
COPY _submodules/toolbox _submodules/toolbox
RUN pip install _submodules/toolbox --upgrade
# copy source code
COPY ./ .
# command to run on container start
CMD [ "python", "./app.py"]
That’s it! Now you have a central repository that has the ‘source of the truth’. You can submit issues here, add feature requests, collaborate and push updates really easily. Other benefits are:
Usability:
Users can keep using the pip install git+REPO_URL
; this is just not enough to build the image. Your programmers can keep installing your packages in a very easy way. Then, when the code is ready to be built into an image, it’s very easy to include the package by just pulling the submodule and including it in the Dockerfile.
A clean repo
In addition git will keep track of your submodules for you. Call pip freeze > requirements.txt
` and notice that toolbox isn’t listed. This is because git knows that it’s a submodule. It is also gitignored so you don’t ‘pollute’ your project repo.
Easy updates
It’s very easy to update the package we’re using in our project; just execute:git submodule update --remote --merge
pip install _submodules/toolbox --upgrade
This will first pull all new code to the package folder and then upgrade the package. Don’t forget to use a virtual environment!
Conclusion
Custom, private Python packages are easy to install, maintain, update and distribute. Using this easy fix to build them into a Docker image makes them even more amazing.
I hope my explanation in this article was clear. If you have suggestions/clarifications please comment so I can improve this article. In the meantime, check out my other articles on all kinds of programming-related topics. Happy coding!
— Mike
P.S: Like what I’m doing? Follow me!