Using multi-stage builds to make your docker image 10x smaller

Clean up your Docker images by leaving behind unnecessary tools

Using multi-stage builds to make your docker image 10x smaller
If you’re done building you need to clean up your tools (image by Cesar Carlevarino Aragon on unsplash)

In this short article we build an image in multiple stages to significantly reduce the size of our docker image. In the end we’ll have an image that does the exact same thing but is almost 10 times smaller in size! The way we do this is by leaving behind tools we use for building the image.

First we’ll create a Dockerfile for building the image in the “regular” way. Then we upgrade this file, making use of an extra stage, leaving behind our unnecessary artifacts. Lastly we’ll optimize further by experimenting with different docker images. Let’s see how much fat we can trim off; Let’s code!


Before we start

We’re using a lot of terminal commands. Check out this article if you are unfamiliar.


Setup

Imagine we are building an app that needs geographical data. Open StreetMap provides this geodata for us: via geofabrik.de we can download geodata per country in the .osm.pbf format. Next we can use a tool called Osmium to merge these files together. Imagine we only need this merged file to feed to our app.

In summary: we only need to use curl (for downloading) and osmium (for merging) once: when we obtain the merged file we don’t need these tools anymore. In a two-stage build we download these tools in the first stage, use them there and then only take the result (the merged file) to the next stage, leaving behind the tools.

Before we see the two-stage build let’s look at the “normal” way.


1. A normal build

In this part we’ll build our image in the simplest way. We keep in mind some small tricks to minimize our image as detailed in this article, though. Here’s the dockerfile:FROM ubuntu:20.04# BUNDLE LAYERS
RUN apt-get update -y && apt install -y --no-install-recommends \
 curl \
 osmium-tool \
&& rm -rf /var/lib/apt/lists/*RUN mkdir /osmfiles \
&& mkdir /merged \
&& curl https://download.geofabrik.de/europe/monaco-latest.osm.pbf -o /osmfiles/monaco.osm.pbf \
&& curl https://download.geofabrik.de/europe/andorra-latest.osm.pbf -o /osmfiles/andorra.osm.pbf \
&& osmium merge /osmfiles/monaco.osm.pbf /osmfiles/andorra.osm.pbf -o /merged/merged.osm.pbf

This dockerfile does exactly what we need: install curl and osmium, download the files and merge them. We end up with the merged file that resides in /merged.

Notice that we don’t do anything after obtaining the merged file. For the purpose of this article we’ll keep this dockerfile simple and skip actually doing something with the file.

Let’s check out the size of this image:testimg:latest        a88a8848201b   16 seconds ago       96.9MB

As you see the image is around 97MB. In the next part we’ll upgrade our dockerfile with multiple stages.

Cleaning up a lot of garbage in this article (image by Jeremy Bezanger on unsplash)

2. Implementing a multi-stage build

In the previous example we’ve used our build-tools (curl and osmium) only once. After they are used they linger in our image. An analogy: you buy a new car but discover all tools that are used to build the car are still in the trunk, taking up precious space!

In this part we’ll focus on leaving behind the tools that we used to build the car. Let’s check out the new Dockerfile and go through the code.FROM ubuntu:20.04 AS final
FROM ubuntu:20.04 as build# BUNDLE LAYERS
RUN apt-get update -y && apt install -y --no-install-recommends \
 curl \
 osmium-tool \
&& rm -rf /var/lib/apt/lists/*RUN mkdir /osmfiles \
&& mkdir /merged \
&& curl http://download.geofabrik.de/europe/monaco-latest.osm.pbf -o /osmfiles/monaco.osm.pbf \
&& curl http://download.geofabrik.de/europe/andorra-latest.osm.pbf -o /osmfiles/andorra.osm.pbf \
&& osmium merge /osmfiles/monaco.osm.pbf /osmfiles/andorra.osm.pbf -o /merged/merged.osm.pbfFROM finalRUN mkdir /merged
COPY --from=build /merged /merged

As you see we use two stages: one called build and one called final. Install curl and osmium in our build-stage. Use them to create the merged file and finally just copy the merged folder to the final stage.

Once we type FROM final we’re “in” our final stage so anything after FROM final happens in there. The same goes for FROM ubuntu:20.04 as build on line 2. Notice that curl and osmium are only installed in our build stage. Once we reach the end of our dockerfile we only keep the stage that we’re in. Because the final stage is active at the end of the dockerfile so everything in the build stage gets left behind. Let’s check out the size of our new container:testimgbetter:latest        7342ee3948e8   3 seconds ago    75.1MB

Our new image is 75MB: we’ve saved over 20MB or a little less than 20% by leaving behind tools we don’t need anymore. This is already a very fine improvement. Let’s find our how we can optimize more.


3. A multi-stage with a smaller final image

In the previous part we used the ubuntu:20.04 image for both the build and final stage. For the build-stage this is pretty logical; here we do complex things: installing packages, downloading stuff, git cloning, in other cases maybe even use cmake or make to compile code! The final stage, however, does not have to be as smart as the build stage.

The idea of this part is that you need a big, rich image like ubuntu:20.04 for the complex build stage. Then we can copy the results from the build stage to a final stage that is based on a much simpler image. In our case we can even use a very small image like Alpine. Let’s adjust our dockerfile a bit:FROM alpine:3.14 AS final

Now, once our build-stage is done, we only copy our merged file into the alpine stage. Let’s check out the total size of the newly build image:testimgbetter                                                latest        8ad3278671e1   17 minutes ago   7.95MB

We went from 75MB to a little under 8MB: a reduction of almost 90%!

Notice that using alpine images does not always work; the idea of this part of this article is that we can use final images that are smaller than the build stage.


Nice and organized (image by Barn Images on unsplash)

Conclusion

In this article we’ve focused on understanding the way docker builds and minimizing docker images by removing all tools that we don’t need anymore. We can do this very efficiently with multiple stages: performing complex tasks like installing and building in a rich image, then copying the result to the slimmer final image and leaving behind the build-stage.

The examples in this article are pretty minimal; only leaving behind curl and osmium. There are examples where I’ve used a bulkier building stage and have saved over 1.5GB, going from an image size of 1.8GB to 0.3GB just by leaving behind artifacts that we don’t need anymore!

I hope this article was clear but if you have suggestions/clarifications please comment so I can make improvements. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!

Join Medium with my referral link — Mike Huls
As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…