Building a Python Docker Image using a Private PyPI Repository
At work, we needed a Docker image to use as our Ray head node. It needs to contain Python modules that are stored in our private PyPI repository. To build the Docker image and install the Python dependencies, I have first tried to use basic authentication in the index URL passed to pip
as follows:
RUN pip install --index-url https://username:[email protected] some-python-module==0.25.0
This works. And since this Dockerfile is stored in a private version control repository, you might think that’s OK. However, it’s not a good idea to have the credentials in clear text. Anyone who has access to your version control repository can have access to your PyPI repository in case of a security breach in your version control system.
Using Environment Variables⌗
A common technique we use when we do not want to provide sensitive information directly is obtaining them from an environment variable and injecting it from a secure source such as Vault. I could not do it in this case because pip does not let you pass credentials in environment variables. You can only use basic authentication or a.netrc
file.
.netrc
file needs to be stored in your user’s home directory and contain the credentials for the server you’re using.
machine nexus.example.com
login emre-aydin
password qwerty
In order for the Docker build process to use .netrc
, we need to make it available to the pip
process that installs the dependency. For that, we have to COPY
the file into the container before executing pip
. To make sure that the user code which will work on this Docker container does not have access to the credentials, we delete .netrc
afterwards.
FROM python:3.9
COPY .netrc .
RUN pip install --index-url https://nexus.example.com some-python-module==0.25.0 \
&& rm /root/.netrc
Knowing how Docker container images are file layers stored on top of each other, I wondered whether the file was still readable somehow. To see the layers of the image, I have used the dive tool. In the following screenshot, you can see how the .netrc
file that we intended to keep as a secret is stored in one of the layers in the image:
If someone acquires access to the Docker agent that runs on our clusters or grasp the Docker image in another way, they will have access to the credentials for our Nexus repository as well.
Enter Docker Secret Mounts⌗
Docker BuildKit has a neat feature called secret mounts to solve this problem. It’s a way to mount local files secretly during the build process. This way, the secret information that we store in the .netrc
file does not end up stored in the final image or any of its layers.
FROM python:3.9
RUN --mount=type=secret,id=netrc,uid=1000 \
cp /run/secrets/netrc /root/.netrc \
&& pip install --index-url https://nexus.example.com some-python-module==0.25.0
The secret with ID netrc
above is injected to the GitHub Actions workflow by GitHub using the repository secrets which means it comes from a secure source and not accessible in plain text by anyone.
- name: Build and push
uses: docker/build-push-action@v3
with:
context: .
push: true
secrets: |
"netrc=${{ secrets.NET_RC }}"
tags: nexus.example.com/ray
Conclusion⌗
It might be tempting and easy to pass secret information in your build process but it might have unforeseen security implications. In this post, we have seen such an example and found a solution to it by improving it incrementally.