2

I have the following Dockerfile

FROM postgres:9.6.18
ENV POSTGRES_PASSWORD postgres
ENV POSTGRES_DB import
COPY docker/admin-db/*.sql /docker-entrypoint-initdb.d/

This works fine, but every time I buiLD and startup my cluster (docker-compose, combined with an api container) it takes about 2 minutes to load all the sql files (it contains test data). This is not very agile, so I would like to load the data on image creation, not when I am starting the container. As the database image will not change frequently, loading the data during image creation, will most of the time be taken from the cached layers.

How can I start the container during image creation, so the data does not need to be loaded during container start every time I run docker-build?

3 Answers 3

2

I have read quite extensively on this topic; finding that the vast majority of recommendations involve using Docker volumes as well. But, I was also convinced that there must be a way. I've come up with the solution, and it works great.

  • Make a directory called init-scripts and put initialization SQL in there.
  • Paste the bash script below into build-bootstrapped-postgres-docker-image.sh alongside said directory.
  • change the POSTGRES_DB and POSTGRES_PASSWORD if you wish.
  • Make the script executable.
  • Execute: ./build-bootstrapped-postgres-docker-image.sh postgres:12.9-alpine your-db 1.0

This will build a pre-populated Postgres Image your-db:1.0 using postgres:12.9-alpine.


#!/bin/bash

# set -o xtrace

PG_IMAGE_NAME=$1
IMG_NAME=$2
IMG_TAG=$3
IMG_FQN="$IMG_NAME:$IMG_TAG"

CONTAINER_NAME="$IMG_NAME-$IMG_TAG-container"

echo 'killing any existing container that is running w same name'
docker kill $CONTAINER_NAME

echo 'running postgres container and bootstrapping schema/data... please wait.'
docker container run \
  --rm \
  --interactive \
  --tty \
  --detach \
  --volume ${PWD}/data:/var/lib/postgresql/data \
  --volume ${PWD}/init-scripts:/docker-entrypoint-initdb.d \
  --name $CONTAINER_NAME \
  --entrypoint /bin/bash \
  --env POSTGRES_DB=database \
  --env POSTGRES_PASSWORD=password \
  --env PGDATA=data \
$PG_IMAGE_NAME

docker container exec -d $CONTAINER_NAME sh -c 'docker-entrypoint.sh postgres >> bootstrap.log 2>&1'

echo 'waiting for container...this may take a while'
grep -q 'IPv4' <(docker exec $CONTAINER_NAME tail -f /bootstrap.log)

# echo 'removing the initialization SQL files'
docker container exec $CONTAINER_NAME rm -rf /docker-entrypoint-initdb.d/*

echo 'stopping pg'
docker container exec -u postgres $CONTAINER_NAME pg_ctl stop -D /data

# commit it.
echo 'committing the container to a new image'

docker container commit \
--change='CMD postgres' \
--change='ENTRYPOINT ["docker-entrypoint.sh"]' \
--change='USER postgres' \
$CONTAINER_NAME $IMG_FQN

# cleanup!
docker kill $CONTAINER_NAME

echo "successfully built $IMG_FQN"

Now you can just run the container:

docker run your-db:1.0
Sign up to request clarification or add additional context in comments.

Comments

0

If you want a simple solution thats a bit manual then you would:

  1. Start the postgres container the normal way to apply the test data.
  2. Exit it. and run docker commit to save the container as a new image.
  3. Use that image as the basis for your testing.

A Completely automated solution that applies the test scripts at docker build time is going to have to understand how to start postgres. Here you can investigate the problem like this:

# Create a postgres container.
docker create postgres:9.6 --name postgres
# Copy the entrypoint script out.
docker cp postgres:/usr/local/bin/docker-entrypoint.sh docker-entrypoint.sh

Now, a modified version of the docker-entrypoint.sh script can be used as a

RUN apply-sql-scripts.sh

in your Dockerfile. It looks potentially complicated.

1 Comment

Thanks for the answer. I really would like it automated, as the solutions needs to be portable. I guess I have to dig into the script. I was hoping that someone already figured this out here. :)
0

Use volumes instead of copying the data to /docker-entrypoint-initdb.d/ on build With Volumes, the first time you bring up the container, it will load all the data. After that it will just reuse the data that is already loaded (which is what you seem to need). As long as you do not delete the volume, your data will always be there when you restart.

Here is a sample:

  pgdb:
    image: postgres
    restart: always
    container_name: pgdb
    env_file: ./postgres/docker-compose.env
    volumes:
      - ./postgres/postgresDB:/var/lib/postgresql/data
      - ./postgres/postgresInit:/docker-entrypoint-initdb.d
    ports:
      - "5432:5432"

3 Comments

Correct me if I am wrong, but wouldn't that mean that all changes in the data will be persisted? I am looking for a solution where every time that I run the database container, it starts of with the same data.
I see. Yes data changes would be persisted. but you could copy the folder that contains the data (in the example above ./postgres/postgresDB) the first time you start the container. Then you can have a simple script to reset the folder the file right before container start up
I could do that, but that would not be a very portable solution. I am still convinced that it should be possible to do it during image creation. In fact, I have been working on a solution that is loading the data during creation. It's partially working for now, but I hope to dive in to it a bit more the next few days.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.