2

So we have around 100 tests, each test connect to a postgres instance and consume a database loaded with some data. The tests edits and change that data so we reload the postgres database for each test.

This takes really long time so I thought of using Docker for this as follows. I'm new to docker so this is the steps I'm using:

1) I would create one postgres container, load it with the test database that I want and make it ready and polished.

2) Use this command to save my container as tar

 docker save -o postgres_testdatabase.tar postgres_testdatabase

3) For each test I load a new tar into an image

  docker load -i postgres_testdatabase.tar

4) Run the container with the postgres instance

docker run -i -p 5432 postgres_testdatabase

5) The test runs and changes the data..

6) Destroy the container and load a fresh container with new fresh test database

7) Run the second test and so on..

My problem is that I found out that when I backup a container to a tar and load it and then run a new container I do not get my database, I basically get a fresh postgres installation with none of my databases.

What I'm doing wrong?

EDIT:

I tried one of the suggestion to commit my changes before I save my container to an image as follows:

I committed my updated container to a new image. Saved that Image to a tar file, deleted my existing container. Loaded the tar file and then run a new container from my saved image. I still don't see my databases.. I believe it has to do something with Volumes. How do I do this without volumes? how do I force all my data to be in the container so it get backed up with the image?

EDIT2 Warmoverflow suggested I use an sql file to load all my data while loading the image. This wont work in my case since the data is carefully being authored using another software (ArcGIS), plus the data has some complex blob fields geometries, so sql file to load the script wont work. He also suggested that I dont need to save the data as tar if im spawing containers in the same machine. Once Im satisified with my data and commit it to the image, i can load the image into a new container. Thanks for clarifying this. Still the problem is that how do I keep my database within my image so when I restore the image, the database comes with the container.

EDIT3

So I find a workaround inspired by warmoverflow suggestion, this should solve my problem. However, I'm still looking for a cleaner way to do this.

The solution is do the following:

  • Create a fresh postgres Container.
  • Populate your database as you please, in my case I use ArcGIS to do so
  • use pg_dumpall to dump the entire postgres instance into a single file with this command. We can run this command from any postgres client, and we don't have to copy the dump file inside the container. I'm running this from Windows.

    C:\Program Files\PostgreSQL\9.3\bin>pg_dumpall.exe -h 192.168.99.100 -p 5432 -U postgres > c:\Hussein\dump\pg_test_dump.dmp

  • You can now safely delete your container.

  • Create a new postgres container
  • Call this command on your container postgres instance to load your dump

    C:\Program Files\PostgreSQL\9.3\bin>psql -f c:\Hussein\dump\ pg_test_dump.dmp -h 192.168.99.100 -p 5432 -U postgres

  • Run the test, test will screw the data so we need to reload, we simply repeat the steps above.

I would still, really want the container image to have the database "in it" so when I run a container from an image, I get the database. Will be great if anyone could suggest a solution with that, will save me huge time.

Edit4 Finally Warmoverflow solved it! Answer below

Thanks

1 Answer 1

5

docker save is for images (saving images as tar file). What you need is docker commit which commit container changes to an image, and then save it to tar. But if your database is the same for all tests, you should build a custom image using a Dockerfile, and then run your containers using the single image.

If your data is loaded using an sql file, you can follow the instructions on "How to extend this image" section of the official postgres docker page https://hub.docker.com/_/postgres/. You can create a Dockerfile with the following content

FROM postgres
RUN mkdir -p /docker-entrypoint-initdb.d
ADD data.sql /docker-entrypoint-initdb.d/

Put your data.sql file and Dockerfile in a new folder, and run docker build -t custom_postgres ., which will build a customized image for you, and every time you run a new container with it, it will load the sql file on boot.

[Update]

Based on the new information from the question, the cause of the issue is that the official postgres image defines a VOLUME at the postgres data folder /var/lib/postgresql/data. VOLUME is used to persist data outside the container (when you use docker run -v to mount a host folder to the container), and thus any data inside the VOLUME are not saved when you commit the container itself. While this is normally a good idea, in this specific situation, we actually need data not be persistent, so that a fresh new container with the same data unmodified can be started every time.

The solution is to create your own version of the postgres image, with the VOLUME removed.

  1. The files are at https://github.com/docker-library/postgres/tree/master/9.3
  2. Download both files to a new folder
  3. Remove the VOLUME line from Dockerfile
  4. In Docker Quickstart Terminal, switch to that folder, and run docker build -t mypostgres ., which will build your own postgres image with the name mypostgres.
  5. Use docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=123456 mypostgres to start your container. The postgres db is available at postgres:[email protected]:5432
  6. Put in your data as normal using ArcGIS
  7. Commit the container with docker commit container_id_from_step_5 mypostgres_withdata. This creates your own postgres image with data.
  8. Stop and remove the intermediate container docker rm -f container_id_from_step_5
  9. Every time you need a new container, in Docker Quickstart Terminal, run docker run -d -p 5432:5432 mypostgres_withdata to start a container, and remember to stop or remove the used container afterwards so that it won't occupy the 5432 port.
Sign up to request clarification or add additional context in comments.

7 Comments

Also check the image for volumes. Your approach does only work for images without volumes.
Thanks for your suggestion, I tried it as follows: I committed my updated container to a new image. Saved that Image to a tar file, deleted my existing container. Loaded the tar file and then run a new container from my saved image. I still don't see my databases.. I believe it has to do something with Volumes. How do I do this without volumes? how do I force all my data to be in the container so it get backed up with the image?
Are you using the official postgres image at hub.docker.com/_/postgres? On that page, in the "How to extend this image" section, there is instruction on how to add an sql file to the image, and then it will load the sql file every time you run a new container from the image, which should suit your needs.
Please see my updated answer. BTW saving the image as tar file is not necessary, you can run new containers directly from your committed image (unless you need to move the image to a new machine)
Actually my data is edited manually using another software (ArcGIS), plus the data has some complex blob fields geometries so sql file to load the script wont work. I liked your suggestion that I dont need to save the data as tar, Once Im satisified with my data and commit it to the image. Then load the image into a container. No im running the whole thing in the same machine. Thanks for clarifying this. Only problem is that how do I keep my database within my imagine so when I restore the image, it comes with the database.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.