Appendix 1 - Building Containers [Docker]
Docker, Dockerfile, Docker Hub
Starting from a simple Dockerfile, we will adopt best practices sequentially and see their effect.
Prerequisites
-
Command line (Unix)
-
Install Docker
To follow along on your own computer, please install Docker Desktop and register for a free account on Docker Hub. Both can be found here.
After the installation, open a terminal (“cmd” on Windows) and make sure you can execute the command
docker run hello-world
successfully. -
Set
BUILDKIT_PROGRESS=plain
for plain output (or remember to rundocker build --progress=plain ...
).
Building for (not on) Rivanna
- Docker
- No Docker on Rivanna
- Docker Hub
- Can be converted into Singularity
- Singularity
- Users cannot build on Rivanna (needs
sudo
privilege) - Singularity Library/Hub (more limitations)
- Refer to workshop in Spring 2020
- Users cannot build on Rivanna (needs
Intro to Dockerfile: lolcow
fortune | cowsay | lolcat
- fortune cookie
- talking cow
- rainbow color
Steps:
- Choose a base image
- Install software dependencies (if any)
- Install software
Step 1: Choose a base image
Use FROM
to specify the base image. In this example, we’ll use Ubuntu 22.04. You do not need to install this on your computer - Docker will pull from Docker Hub when you build it.
FROM ubuntu:22.04
- OS:
ubuntu
,debian
,centos
, … - Doesn’t have to be a bare OS
python
,continuumio/miniconda3
,node
,nvidia/cuda
, etc.
Steps 2 & 3: Install software
Use RUN
to specify the actual commands to be executed (as if you were to type them on the command line).
FROM ubuntu:22.04
RUN apt-get install fortune cowsay lolcat
Save this file as Dockerfile
and run docker build .
Does it work?
We need to update our package list. Let’s modify our Dockerfile and build again.
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install fortune cowsay lolcat
This time it still failed due to the prompt for confirmation. To pass “yes” automatically, add -y
.
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y fortune cowsay lolcat
This finally works. It returns an image ID that we can call to run it:
docker run --rm -it <img>
But it only returns a shell prompt where fortune
, cowsay
, lolcat
don’t seem to work. What’s wrong?
Summary so far
- Build:
- Update package manager
- Automatic yes to prompt
- Run:
- Use
--rm
to remove container after it exits - Use
-it
for interactive processes (e.g. shell)
- Use
- Problems:
- User needs to know path to executable
- User just wants to run “lolcow”
Use ENV
to set environment variable
This is equivalent to export PATH=/usr/games:${PATH}
but it is preserved at runtime. In doing so we can execute fortune
, cowsay
, and lolcat
directly without specifying the full path.
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y fortune cowsay lolcat
ENV PATH=/usr/games:${PATH}
Use ENTRYPOINT
to set default command
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y fortune cowsay lolcat
ENV PATH=/usr/games:${PATH}
ENTRYPOINT fortune | cowsay | lolcat
Finally, we can simply run docker run --rm -it <img>
to get the desired behavior. You now know how to build a working Docker container.
4 Best Practices
While our container is functional, there is a lot of room for improvement. We shall look at some important best practices for writing Dockerfiles.
0. Package manager cache busting
The idea of “cache busting” is to force update
whenever a change is made to install
. This ensures we get the latest packages (especially critical security updates) should we make changes and rebuild the image in the future.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
fortune cowsay lolcat
ENV PATH=/usr/games:${PATH}
ENTRYPOINT fortune | cowsay | lolcat
- Save this as
Dockerfile0
, which will be the basis for comparison - For consistency we shall use the same tag as the number
docker build -t <user>/lolcow:0 -f Dockerfile0 .
1. Clean up
Almost all package managers leave behind some cache files after installation that can be safely removed. Depending on your application, they can easily accumulate up to several GBs. Let’s see what happens if we try to clean up the cache in a separate RUN
statement.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
fortune cowsay lolcat
RUN rm -rf /var/lib/apt/lists/* # clean up command for apt
ENV PATH=/usr/games:${PATH}
ENTRYPOINT fortune | cowsay | lolcat
docker build -t <user>/lolcow:0.5 -f Dockerfile0.5 .
docker images | grep lolcow
You should see that there is no difference in the image size. Why?
- Each statement creates an image layer.
- If you try to remove a file from a previous layer, Docker will make a “whiteout” so that you can’t see it, but the file is still there.
- The file can be retrieved.
- This is not just a size issue but also a security pitfall.
Very important! You must remove files in the same RUN
statement as they are added.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
fortune cowsay lolcat && \
rm -rf /var/lib/apt/lists/*
ENV PATH=/usr/games:${PATH}
ENTRYPOINT fortune | cowsay | lolcat
docker build -t <user>/lolcow:1 -f Dockerfile1 .
docker images | grep lolcow
Now you should see that the clean-up is effective.
2. Only install what’s needed
The apt
package manager often recommends related packages that are not really necessary. To disable recommendation, use --no-install-recommends
.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y --no-install-recommends \
fortune fortunes-min cowsay lolcat && \
rm -rf /var/lib/apt/lists/*
ENV PATH=/usr/games:${PATH}
ENTRYPOINT fortune | cowsay | lolcat
- You may need to specify extra packages
fortune
itself provides the executable without the message databasefortunes-min
contains the message database
- See how Ubuntu reduced image size by 60%
3. Use a smaller base image
For installation of common packages, you may consider Alpine.
- BusyBox + package manager + musl libc (beware of compatibility issues)
- Presentation on Alpine Linux from DockerCon EU 17
Look for slim
variants (e.g. debian:buster-slim
) of a base image, if any.
FROM alpine:3.17
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories && \
apk add --no-cache fortune cowsay@testing lolcat@testing
ENTRYPOINT fortune | cowsay | lolcat
Note: An ENV
statement is not needed here because the executables are installed under /usr/bin
.
Image size comparison
$ docker images | grep lolcow | sort -nk 2 | awk '{print $1, $2, $NF}'
<user>/lolcow 0 207MB
<user>/lolcow 0.5 207MB
<user>/lolcow 1 167MB
<user>/lolcow 2 154MB
<user>/lolcow 3 45.9MB
Version | Description | Reduction (MB) | % |
---|---|---|---|
0 | (Basis of comparison) | - | - |
0.5 | Clean up in separate RUN |
0 | 0 |
1 | Clean up in same RUN |
40 | 19 |
- | Install only what’s needed | 13 | 6 |
2 | Combination of previous two | 53 | 26 |
3 | Alpine base image | 161 | 78 |
Reference: Best practices for writing Dockerfiles
Summary
- Choose a base image (
FROM
) - Install software dependencies (
RUN
) - Install software (
RUN
) - Clean up (in same
RUN
statement as installation) - Define environment variables (
ENV
) - Define default command (
ENTRYPOINT
)
Push to Docker Hub
You can push your image to Docker Hub easily. First, let’s set our lolcow version 3 as the latest.
docker tag <user>/lolcow:3 <user>/lolcow:latest
Then sign in to Docker Hub and push as follows:
docker login
docker push <user>/lolcow:latest
Docker Hub interface
In your browser, go to https://hub.docker.com/r/<user>/lolcow
.
- Overview:
- Sync with GitHub to update
README.md
; or - Use docker-pushrm
- Sync with GitHub to update
- Tags:
- List all versions
- View image history if Dockerfile not provided
- Compressed size is much smaller than size on disk
Case Studies (hands-on)
By now, we know how to write a simple Dockerfile to install software using the distro’s package manager. In practice, we may encounter software that does not exist in the package list. How do we deal with such cases?
Compiled language (C++)
https://github.com/lilab-bcb/cumulus_feature_barcoding
Hints:
- You do not have to start from a bare OS. Search for
gcc
on Docker Hub. - Install
build-essential
if you are starting from a bare OS. - Version pinning - to choose a specific version, download from https://github.com/lilab-bcb/cumulus_feature_barcoding/releases (you will need
wget
).
Interpreted language (Python)
https://docs.qiime2.org/2022.8/install/native/#install-qiime-2-within-a-conda-environment
Hints:
- Click on “Linux” to get the URL for the yaml file. Download the yaml file in the same directory as your Dockerfile.
- You do not have to start from a bare OS in your Dockerfile. Search for
miniconda3
on Docker Hub. - (Recommended) There is a much faster dependency solver than conda - micromamba. See here for instructions.
- Use the suggested
COPY
andENTRYPOINT
statements. - After you’re done, compare with the official Dockerfile and image size. What is the biggest reason for the difference?
General Remarks
- Play with different base images and package managers.
- If you encounter a Docker statement that you have not used before, first check the official documentation for best practices.
- A comprehensive list of dependencies may be lacking. Some developers may not specify any at all. You will have to rely on a combination of experience, error message, and web search. (Most likely all of the above.)
- Especially for Python packages, versions may be too permissive or too restrictive such that, in either case, future installation of the application will fail. (I have encountered both.) Tweak the versions until it works.
- The next step is “multi-stage build” which is covered in the Minimal Containers workshop. There you will learn how to distinguish between buildtime versus runtime dependencies and separate them out.
Clean Up
If you build containers often, you can run out of disk space quickly. To clean up:
-
Run
docker rmi <IMAGE_ID>
to remove a specific image. -
Run
docker system prune
to clean up cache. (This will not affect images that are tagged.)$ docker system prune WARNING! This will remove: - all stopped containers - all networks not used by at least one container - all dangling images - all dangling build cache Are you sure you want to continue? [y/N] y
References
-
UVA Rivanna-Docker GitHub
- Dockerfiles by UVA Research Computing
- Tips
- Best practices for writing Dockerfiles
- Natanael Copa, Small, Simple, and Secure: Alpine Linux under the Microscope, DockerCon EU (2017)