2

Our Java application stores data pertaining to multiple tenants in one single DB instance on AWS RDS. This setup is replicated across regions in prod. There are multiple schemas and each table within those schemas have a tenantId field. It might not necessarily be the primary key but all tables have indices on the tenantId field.

Now the requirement is to select data stored for one tenant, across all tables and re-insert it with a different tenantId.

To clarify I have 3 tables tableA, tableB, tableC each containing R rows for tenant_A. Now a different tenant_B gets created in the system and the requirement is that each of the R rows across all three tables gets created in those 3 tables with tenantId = tenant_B. This has to happen as a bootstrap DB sync step before tenant_B starts with it's own lifecycle in our system.

One option I was considering to create an one time pg_dump of all the tables, upload it to S3, store it in per tenant buckets and whenever a new tenant (like tenant_B) comes, we restore from the respective SQLs for the parent tenant - tenant_A in our example by replacing tenantId field with tenant_B now (in all those inserts).

But the concern is the data we have is huge and to do a pg_dump daily might be un-necessary overloading the DB instance (if at all it has concerning overhead). I was looking if we can stream the incremental updates on top of the one-time dump and that way we keep the dump up-to-date. However, streaming WAL logs has it's own disadvantage - what if replication/streaming stops and the logs start filling up the DB storage? Is there a more elegant way of doing this?

Another option is to fire pg_dump only when a child tenant (like tenant_B gets created). So that we trigger the dump for it's parent tenant_A upload it to S3 and restore it. However, with more and more concurrent tenants getting created in quick succession - this might not be a feasible solution in the long run.

Why are we uploading pg_dump to S3 : As we might need to run it across Geo. As in the dump for tenant_A is from us-west-2 and tenant_B might get created in eu-central-1. That's our internal logic that handles tenant creation and the region boundary is not guaranteed.

Any help here would be hugely appreciated. Thanks!

aws_s3.query_export_to_s3 : can aws_s3 plug-in be of any help here. I haven’t worked with it before hence I am not aware of the performance. Can someone point me to the right resources around it . Thanks

6
  • What I would do: Write code that reads the catalog to generate insert. . . select statements and then execute them in an order that works with your FK relationships. Commented Sep 12, 2023 at 17:50
  • Thanks @MikeOrganek ! However, the problem here is the cross-region sync. Basically the requirement is perform the select ... part in us-west-2 DB instance (that of tenant_A) and insert it in eu-central-1 (where tenant_B resides). Each tenant however resides in one and only one region. So, that's why a global S3 bucket that can be accessible across geo regions is what I am thinking to go ahead with. Commented Sep 12, 2023 at 18:08
  • My apologies. I made the incorrect assumption that your databases were replicating across regions already. I would still approach the problem the same way except using copy and intermediate files. In other words, I would treat this as a development project instead of hoping to find a tool. Commented Sep 13, 2023 at 12:26
  • I would reccommend also asking this question on dba.stackexchange.com Commented Sep 13, 2023 at 12:32
  • Already posted but no activity yet - dba.stackexchange.com/questions/331130/… Commented Sep 13, 2023 at 12:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.