0

I currently have a prod and test database that live on 2 servers azure postgres servers. I want to do a nightly backup of the prod database onto test, such that every morning the two are identical. My datatables have contraints and keys, so I can't just copy over the data itself but also the schemas, so a simple pandas df.to_sql won't cover it.

My current plan is to run a nightly Azure Functions python script that does the copying over. I tried sqlalchemy but had significant issues copying over metadata correctly. Now I am trying to use postgres' pg_dump and pg_restore/psql commands via a subprocess with the following code:

def backup_database(location, database, password, username, backup_file):
    # Use pg_dump command to create a backup of the specified database
    cmd = [
        'pg_dump',
        '-Fc',
        '-f', backup_file,
        '-h', location,
        '-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]
    subprocess.run(cmd, check=True, input=password.encode())

def clear_database(engine, metadata):
    # Drop all tables in the database
    metadata.drop_all(bind=engine, checkfirst=False)

def restore_database(location, database, password, username, backup_file):
    # Use pg_restore command to restore the backup onto the database
    # cmd = ['pg_restore', '-Fc', '-d', engine.url.database, backup_file]
    cmd = [
        'pg_restore',
        '-Fc',
        '-C',
        '-f', backup_file,
        '-h', location,
        #'-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]

    try:
        subprocess.run(cmd, check=True, capture_output=True, text=True)
        print("Backup restored onto the test server.")
    except subprocess.CalledProcessError as e:
        print("Error occurred while restoring the backup:")
        print(e.stdout)  # Print the output from the command
        print(e.stderr)  # Print the error message, if available


# Define backup file path
backup_file = '/pathtofile/backup_file.dump'  # Update with the desired backup file path
backup_file2 = 'backup_file.dump'  # Update with the desired backup file path

# Backup the production database
backup_database(input_host, input_database, input_password, input_user, backup_file)
print("Backup of the production database created.")

# Create metadata object for test server
output_metadata = MetaData(bind=output_engine)

clear_database(output_engine, output_metadata)
print("Test server cleared.")

restore_database(output_host, output_datebase, output_password, output_user, backup_file2)
print("Backup restored onto the test server.")

This code appears to be creating a dump file, but it is not successfully restoring to the test database. If I get this code to work, how do I specify file paths within Azure Functions, is this a suitable solution to run from Azure Functions? If not, how to get sqlalchemy to successfully clear test data/metadata, then copy over data from prod every night?

3
  • What's the problem? If you get an error message, what is it? If you don't get an error, what indicates that it is not successful? Commented Jun 1, 2023 at 17:11
  • @jjanes in my local repo, a dump file appears, but in my database I don't see the data being restored. Even if it did appear, I'm not sure how to modify the code to be compatible with azure functions Commented Jun 1, 2023 at 17:47
  • able to copy data from source to destination Commented Jun 5, 2023 at 16:06

1 Answer 1

1

I have referred MSDOC Psycopg and PostgreSQL.

import  psycopg2
src_conn_string = "SourceConnectionString"
dst_conn_string = "DStConnectionString"
try:
src_conn = psycopg2.connect(src_conn_string)
src_cursor = src_conn.cursor()
print("Connected to source database.")
try:
dst_conn = psycopg2.connect(dst_conn_string)
dst_cursor = dst_conn.cursor()
print("Connected to destination database.")
try:
src_cursor.execute(
"SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE'"
)
tables = src_cursor.fetchall()
for  table  in  tables:
src_cursor.execute("SELECT * FROM {0}".format(table[0]))

rows = src_cursor.fetchall()

for  row  in  rows:
dst_cursor.execute("INSERT INTO {0} VALUES {1}".format(table[0],  row))
print("Data transferred successfully.")
except  psycopg2.Error  as  e:
print("Error transferring data: ",  e)
finally:
dst_conn.commit()
dst_cursor.close()
dst_conn.close()
print("Destination database connection closed.")
except  psycopg2.Error  as  e:
print("Error connecting to destination database: ",  e)
finally:
src_cursor.close()
src_conn.close()
print("Source database connection closed.")
except  psycopg2.Error  as  e:
print("Error connecting to source database: ",  e)

Output:

enter image description here

In Azure:

Source:

enter image description here

Destination:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.