4

I have an application at Location A (LA-MySQL) that uses a MySQL database; And another application at Location B (LB-PSQL) that uses a PostgreSQL database. (by location I mean physically distant places and different networks if it matters)

I need to update one table at LB-PSQL to be synchronized with LA-MySQL but I don't know exactly which are the best practices in this area.

Also, the table I need to update at LB-PSQL does not necessarily have the same structure of LA-MySQL. (but I think that isn't a problem since the fields I need to update on LB-PSQL are able to accommodate the data from LA-MySQL fields)

Given this data, which are the best practices, usual methods or references to do this kind of thing?

Thanks in advance for any feedback!

2
  • 1
    Do you need it synchronous (slow, changes seen on replica on the moment of master commit) or asynchronous (fast, but changes on replica can be seen after some delay). If asynchronous then what kind of delay can you live with: couple of seconds, couple of hours, a day? Commented Jan 12, 2011 at 17:27
  • 1
    @Tometzky, it can perfectly be an asynchronous task. As for the delay, I can live with something between a day/week. Commented Jan 12, 2011 at 17:42

3 Answers 3

2

If both servers are in different networks, the only chance I see is to export the data into a flat file from MySQL.

Then transfer the file (e.g. FTP or something similar) to the PostgreSQL server and import it there using COPY

I would recommend to import the flat file into a staging table. From there you can use SQL to move the data to the approriate target table. That will give you the chance to do data conversion or do updates on existing rows.

If that transformation is more complicated you might want to think about using an ETL tool (e.g. Kettle) to do the migration on the target server .

Sign up to request clarification or add additional context in comments.

2 Comments

thank you but I was expecting some kind of "automated process" solution, exporting, transferring via FTP and importing isn't ideal.
There will be no 100% automated process without some up front put in. You will have to do something. You could look at an ETL tool (Pentaho maybe) as they are designed for this type of task, but it's still requires doing some development work to create the ETL process.
2

Just create a script on LA that will do something like this (bash sample):

TMPFILE=`mktemp` || (echo "mktemp failed" 1>&2; exit 1)
pg_dump --column-inserts --data-only --no-password \
  --host="LB_hostname" --username="username" \
  --table="tablename" "databasename" \
  awk '/^INSERT/ {i=1} {if(i) print} # ignore everything to first INSERT' \
  > "$TMPFILE" \
  || (echo "pg_dump failed" 1>&2; exit 1)
(echo "begin; truncate tablename;"; cat "$TMPFILE"; echo 'commit;' ) \
  | mysql "databasename" < "$TMPFILE" \
  || (echo "mysql failed" 1>&2; exit 1) \
rm "$TMPFILE"

And set it to run for example once a day in cron. You'd need a '.pgpass' for postgresql password and mysql option file for mysql password.

This should be fast enough for a less than a million of rows.

6 Comments

Could you explain the awk like please?
There's a comment - "ignore everything to first INSERT". pg_dump generates some lines for configuration that are incompatible with other databases - this awk ignores everything until a line starting with "INSERT" shows up.
I've read the comment ;) I was more interested the why ("pg_dump generates some lines for [...]").
@Tometzky: doesn't this do it the wrong way round? To my understanding MySQL should be the source and Postgres the target. Your solution uses Postgres as the source as far as I can tell.
@a_horse_with_no_name: eee... implementing this in correct direction left as an exercise for the reader ;-)
|
1

Not a turnkey solution, but this is some code to help with this task using triggers. The following assumes no deletes or updates for brevity. Needs PG>=9.1

1) Prepare 2 new tables. mytable_a, and mytable_b. with the same columns as the source table to be replicated:

CREATE TABLE  mytable_a AS TABLE mytable WITH NO DATA;
CREATE TABLE  mytable_b AS TABLE mytable WITH NO DATA;

-- trigger function which copies data from mytable to mytable_a on each insert
CREATE OR REPLACE FUNCTION data_copy_a() RETURNS trigger AS $data_copy_a$
    BEGIN
    INSERT INTO mytable_a SELECT NEW.*;
        RETURN NEW;
    END;
$data_copy_a$ LANGUAGE plpgsql;

-- start trigger
CREATE TRIGGER data_copy_a AFTER INSERT ON mytable FOR EACH ROW EXECUTE PROCEDURE data_copy_a();

Then when you need to export:

-- move data from mytable_a -> mytable_b without stopping trigger
WITH d_rows AS (DELETE FROM mytable_a RETURNING * )  INSERT INTO mytable_b SELECT * FROM d_rows; 

-- export data from mytable_b -> file
\copy mytable_b to '/tmp/data.csv' WITH DELIMITER ',' csv; 

-- empty table
TRUNCATE mytable_b;

Then you may import the data.csv to mysql.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.