3

I have a remote Postgresql DB with a table (which I can not alter) that contains information (just a few columns not the entire table) I want to sync to a local SQL Server 2008 Express database with a table (which I can alter).

Now I'm looking for an efficient way to do this. Since the sync will run every ~5 minutes reloading the entire table every time would produce alot of unnecessary traffic I'm trying to avoid.

I thought about saving the most recent ID and just grabing everything that's newer, but it is possible (not likely but still possible) that the older data gets altered. Just going this way I would miss out on the altered data.

The sync will be done by a C# program running on the same machine as the local SQL Server.

4
  • Does SQL 2008 Express come with SQL Server Agent? If so you could write this as a scheduled job which would remove the need for a c# application. Also, how many rows are in the table as it may not be that much traffic. Data that's changed needs to be visible and without some sort of marker against the data which contains a modified date, hash code etc it will be difficult. Not sure if SQL replication works across different database providers but if it does that's another option which could minimise the traffic Commented Jan 31, 2013 at 9:56
  • 1
    no "LastModifiedTime" and "InsertTime" columns in the Postgresql DB ? Commented Jan 31, 2013 at 9:56
  • Unfortunately the Postgresql table has no date/modified columns and i can not add them :( And as far as I know SQL replication only works between SQL Servers. The table has about ~100k rows at the moment Commented Jan 31, 2013 at 10:09
  • Asked the DB Admin and he said about 700 Byte Commented Jan 31, 2013 at 11:17

1 Answer 1

3

There's two solutions to this problem. You could try to be smart and only transfer the changes - but that requires some integration on the source database; I'm sure your db admin can help you with that - potentially triggers that keep track of all touched rows (e.g. using the source table's primary key). This solution scales fairly well, but it's more complicated. I think you should look at a second option: simple brute force.

By the sounds of it, your entire table fits comfortably in 100MB. This just isn't much data. Assuming you can get a transfer rate of 10MB/s (which isn't very outlandish at all), you could transfer everything in just 10 seconds. If, as you say, you only need a few columns, the total data transfer could be considerably less. Going with the 10 second number every five minutes, that'd be on the order of 3% load to keep the data up to date - and it's a trivial query for the source database what will likely not cause much load, especially since it'll all be cached in memory since the data set is so small.

Take a look at SqlBulkCopy. This article (Transderring data using SqlBulkCopy) is an example of using it to copy data from one database to the next. The source data reader could be anything; I use it to insert computed data from objects, for instance, but a particularly easy case is a DbDataReader, which you can get for the select statement from Postgresql.

Unfortunately, the default options aren't too brilliant, so you'll probably want to specify SqlBulkCopyOptions useful to you. TableLock is probably not a bad one. Also, if you're doing this in parallel (i.e. multiple bulk inserts into one table), beware of indexes (which can cause deadlocks). If you play with the bulk-copies batch size you can optimize the trade-off between throughput and memory usage, though the default may work just fine.

Conceptually then I'd do this:

  • open a connection to your source and target databases (use using)
  • start an SqlTransaction on the target database connection
  • delete all rows from the target table.
  • bulk copy from the source to the target (don't forget to pass the transaction)
  • commit

This way, you'll atomically update the destination table.

I'm not sure what you're trying to do, but if this is some form of caching, consider doing away with the target SQL server entirely and leaving the data in memory as an array of objects. In-memory accesses to read-only data is super fast, and your dataset easily fits in memory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.