Using multi-stage builds to make your docker image 10x smaller
Clean up your Docker images by leaving behind unnecessary tools
In this short article we build an image in multiple stages to significantly reduce the size of our docker image. In the end we’ll have an image that does the exact same thing but is almost 10 times smaller in size! The way we do this is by leaving behind tools we use for building the image.
First we’ll create a Dockerfile for building the image in the “regular” way. Then we upgrade this file, making use of an extra stage, leaving behind our unnecessary artifacts. Lastly we’ll optimize further by experimenting with different docker images. Let’s see how much fat we can trim off; Let’s code!
Before we start
We’re using a lot of terminal commands. Check out this article if you are unfamiliar.
Setup
Imagine we are building an app that needs geographical data. Open StreetMap provides this geodata for us: via geofabrik.de
we can download geodata per country in the .osm.pbf
format. Next we can use a tool called Osmium
to merge these files together. Imagine we only need this merged file to feed to our app.
In summary: we only need to use curl
(for downloading) and osmium
(for merging) once: when we obtain the merged file we don’t need these tools anymore. In a two-stage build we download these tools in the first stage, use them there and then only take the result (the merged file) to the next stage, leaving behind the tools.
Before we see the two-stage build let’s look at the “normal” way.
1. A normal build
In this part we’ll build our image in the simplest way. We keep in mind some small tricks to minimize our image as detailed in this article, though. Here’s the dockerfile:FROM ubuntu:20.04# BUNDLE LAYERS
RUN apt-get update -y && apt install -y --no-install-recommends \
curl \
osmium-tool \
&& rm -rf /var/lib/apt/lists/*RUN mkdir /osmfiles \
&& mkdir /merged \
&& curl https://download.geofabrik.de/europe/monaco-latest.osm.pbf -o /osmfiles/monaco.osm.pbf \
&& curl https://download.geofabrik.de/europe/andorra-latest.osm.pbf -o /osmfiles/andorra.osm.pbf \
&& osmium merge /osmfiles/monaco.osm.pbf /osmfiles/andorra.osm.pbf -o /merged/merged.osm.pbf
This dockerfile does exactly what we need: install curl and osmium, download the files and merge them. We end up with the merged file that resides in /merged
.
Notice that we don’t do anything after obtaining the merged file. For the purpose of this article we’ll keep this dockerfile simple and skip actually doing something with the file.
Let’s check out the size of this image:testimg:latest a88a8848201b 16 seconds ago 96.9MB
As you see the image is around 97MB. In the next part we’ll upgrade our dockerfile with multiple stages.
2. Implementing a multi-stage build
In the previous example we’ve used our build-tools (curl and osmium) only once. After they are used they linger in our image. An analogy: you buy a new car but discover all tools that are used to build the car are still in the trunk, taking up precious space!
In this part we’ll focus on leaving behind the tools that we used to build the car. Let’s check out the new Dockerfile and go through the code.FROM ubuntu:20.04 AS final
FROM ubuntu:20.04 as build# BUNDLE LAYERS
RUN apt-get update -y && apt install -y --no-install-recommends \
curl \
osmium-tool \
&& rm -rf /var/lib/apt/lists/*RUN mkdir /osmfiles \
&& mkdir /merged \
&& curl http://download.geofabrik.de/europe/monaco-latest.osm.pbf -o /osmfiles/monaco.osm.pbf \
&& curl http://download.geofabrik.de/europe/andorra-latest.osm.pbf -o /osmfiles/andorra.osm.pbf \
&& osmium merge /osmfiles/monaco.osm.pbf /osmfiles/andorra.osm.pbf -o /merged/merged.osm.pbfFROM finalRUN mkdir /merged
COPY --from=build /merged /merged
As you see we use two stages: one called build and one called final. Install curl and osmium in our build-stage. Use them to create the merged file and finally just copy the merged
folder to the final
stage.
Once we type FROM final
we’re “in” our final stage so anything after FROM final
happens in there. The same goes for FROM ubuntu:20.04 as build
on line 2. Notice that curl and osmium are only installed in our build stage. Once we reach the end of our dockerfile we only keep the stage that we’re in. Because the final stage is active at the end of the dockerfile so everything in the build stage gets left behind. Let’s check out the size of our new container:testimgbetter:latest 7342ee3948e8 3 seconds ago 75.1MB
Our new image is 75MB: we’ve saved over 20MB or a little less than 20% by leaving behind tools we don’t need anymore. This is already a very fine improvement. Let’s find our how we can optimize more.
3. A multi-stage with a smaller final image
In the previous part we used the ubuntu:20.04
image for both the build and final stage. For the build-stage this is pretty logical; here we do complex things: installing packages, downloading stuff, git cloning, in other cases maybe even use cmake
or make
to compile code! The final stage, however, does not have to be as smart as the build stage.
The idea of this part is that you need a big, rich image like ubuntu:20.04
for the complex build stage. Then we can copy the results from the build stage to a final stage that is based on a much simpler image. In our case we can even use a very small image like Alpine. Let’s adjust our dockerfile a bit:FROM alpine:3.14 AS final
Now, once our build-stage is done, we only copy our merged file into the alpine stage. Let’s check out the total size of the newly build image:testimgbetter latest 8ad3278671e1 17 minutes ago 7.95MB
We went from 75MB to a little under 8MB: a reduction of almost 90%!
Notice that using alpine images does not always work; the idea of this part of this article is that we can use final images that are smaller than the build stage.
Conclusion
In this article we’ve focused on understanding the way docker builds and minimizing docker images by removing all tools that we don’t need anymore. We can do this very efficiently with multiple stages: performing complex tasks like installing and building in a rich image, then copying the result to the slimmer final image and leaving behind the build-stage.
The examples in this article are pretty minimal; only leaving behind curl
and osmium
. There are examples where I’ve used a bulkier building stage and have saved over 1.5GB, going from an image size of 1.8GB to 0.3GB just by leaving behind artifacts that we don’t need anymore!
I hope this article was clear but if you have suggestions/clarifications please comment so I can make improvements. In the meantime, check out my other articles on all kinds of programming-related topics like these:
- Docker for absolute beginners
- Docker Compose for absolute beginners
- Turn Your Code into a Real Program: Packaging, Running and Distributing Scripts using Docker
- Why Python is slow and how to speed it up
- Advanced multi-tasking in Python: applying and benchmarking threadpools and processpools
- Write you own C extension to speed up Python x100
- Getting started with Cython: how to perform >1.7 billion calculations per second in Python
- Create a fast auto-documented, maintainable and easy-to-use Python API in 5 lines of code with FastAPI
Happy coding!
— Mike
P.S: like what I’m doing? Follow me!