I am bulk-inserting some data into a postgres DB on RDS, and it's taking longer than I would like.
A migration to initialise the DB schema would look like this:
CREATE TABLE "addresses" ("address" TEXT NOT NULL);
CREATE UNIQUE INDEX "unique_address" ON "addresses"("address");
CREATE INDEX "autocomplete_index" ON "addresses" USING btree (lower(address) text_pattern_ops);
The data is coming from S3 where I have a collection of around 800 256MB CSV files. For each CSV file, I use the aws_s3.table_import_from_s3 function to copy the data into a temporary table. This part is very fast. Then I insert from the temporary table into my addresses table like this:
INSERT INTO addresses
SELECT * FROM temp_addresses
ON CONFLICT (address) DO NOTHING;
This INSERT takes about 90 minutes to import a single 256MB csv file.
From the performance insights page it seems like that bottleneck is IO. (This is what I infer from the bars here being dominated by "IO:DataFileRead").
The DB instance is a db.t3.small with 2 vCPU and 2 GB RAM, 1024 GB of gp3 storage with 12000 provisioned IOPS and 500 MiBps throughput.
From what I can tell, I am far below the limit in terms of IO throughput:
...and I also seem to be well below the limit in terms of IOPS:
...so I'm struggling to understand what the bottleneck is here. What am I missing?
Extra notes:
Here is a chart of the CPU usage during the load:
And here's one of Freeable memory during the load:





COPYinstead ofINSERTwhich is relatively fast and also disable non-essential indexes during bulk loading and re-enable them afterward. There must be others too you can research on it but its most likely your thourhgput is not bein utilized by small size intace