0

I'm developing a shell script that loops through a series of Postgres database table names and dumps the table data. For example:

# dump data

psql -h $SRC_IP_ADDRESS -p 5432 -U postgres -c "BEGIN;" AWARE

do
 :
 pg_dump -U postgres -h $IP_ADDRESS -p 5432 -t $i -a --inserts MYDB >> \
 out.sql
done

psql -h $IP_ADDRESS -p 5432 -U postgres -c "COMMIT;" MYDB

I'm worried about concurrent access to the database, however. Since there is no database lock for Postgres, I tried to wrap a BEGIN and COMMIT around the loop (using psql, as shown above). This resulted in an error message from the psql command, saying that:

WARNING:  there is no transaction in progress

Is there any way to achieve this? If not, what are the alternatives?

Thanks!

2
  • I wrote about this topic a little while ago. The short version is "drive psql as a co-process with the coproc command, or use a scripting language with support for connecting to Pg. See stackoverflow.com/a/8305578/398670 . However, that won't help you if you're using pg_dump, it's only useful for psql alone. Commented Aug 25, 2012 at 2:00
  • BTW, there is a way to lock all other users (except superusers) out of the DB. You can REVOKE the CONNECT right on the database for everybody except yourself (though superusers can always connect) then boot everybody else off using pg_terminate_backend(pid) in a query against pg_stat_activity. Not trivial, I'll admit. In this case I think you're barking up entirely the wrong tree with this approach anyway, see ruakh's answer. Commented Aug 25, 2012 at 2:06

1 Answer 1

2

Your script has two main problems. The first problem is practical: a transaction is part of a specific session, so your first psql command, which just starts a transaction and then exits, has no real effect: the transaction ends when the command completes, and later commands do not share it. The second problem is conceptual: changes made in transaction X aren't seen by transaction Y until transaction X is committed, but as soon as transaction X is committed, they're immediately seen by transaction Y, even if transaction Y is still in-progress. This means that, even if your script did successfully wrap the entire dump in a single transaction, this wouldn't make any difference, because your dump could still see inconsistent results from one query to the next. (That is: it's meaningless to wrap a series of SELECTs in a transaction. A transaction is only meaningful if it contains one or more DML statements, UPDATEs or INSERTs or DELETEs.)

However, since you don't really need your shell-script to loop over your list of tables; rather, you can just give pg_dump all the table-names at once, by passing multiple -t flags:

pg_dump -U postgres -h $IP_ADDRESS -p 5432 \
    -t table1 -t table2 -t table3 -a --inserts MYDB >> out.sql

and according to the documentation, pg_dump "makes consistent backups even if the database is being used concurrently", so you wouldn't need to worry about setting up a transaction even if that did help.

(By the way, the -t flag also supports a glob notation; for example, -t table* would match all tables whose names begin with table.)

Sign up to request clarification or add additional context in comments.

18 Comments

Hi ruakh, thanks for your advice. I am looping through the tables because I need the dump output to be in a specific order for when I restore it. While the pg_dump is only doing SELECTs, I do also have a loop executing queries to insert SELECTed, queried data into temp tables. This is also a concern for me. Finally, when I use psql to restore the dump file, is that performed as a safe transaction?
@littleK: Firstly -- unrelated to your comment -- I've fixed some mistakes in my original answer. Secondly -- when you say that you say that you "need the dump output to be in a specific order for when I restore it", what kind of order do you have in mind? Because I just played a bit, and although I don't see anything in the documentation about this, it looks like pg_dump makes a point of dumping tables in an order that's consistent with foreign-key constraints, and I can't imagine what other ordering you could need. Thirdly -- when you restore the data, you can certainly [continued]
[continued] wrap that in a transaction, but I can't imagine why you would need to. I mean usually, if you're restoring from backup, the entire system is offline until everything is restored.
By the way, according to the documentation, --inserts "will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases." If you're planning to restore the data using straight-up psql, COPY statements might be better.
Thanks for your help. I will switch to using the COPY statements. When I had tried the dump without explicitly looping through each table, there were key constraint errors upon restore and the order in the dump seemed to be alphabetical. I will revisit that, though. When the documentation says that pg_dump makes consistent backups, even when performed concurrently, does that mean that it will dump any table data as it was before, say, a concurrent insert? I appreciate you sharing your knowledge with me, its helping a lot.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.