0

Edit: Sorry, I should have explained a tad better, the data is out of salesforce, dumped from the backend, the ID fields are alphanumeric [eg. 00190000010PBdSAAX], generally all are 18 in length & are always unique. I'll make some changes to data types, get rid of the quoted identifiers, and make some changes to the indexes, see how I go!

I am using PostgreSQL 9.5. I'm updating 1 table to another, both tables are identical in structure, one has 2 million records [target] and the other around 70k [source], which is basically just performing an update on existing and inserting any new records using a unique ID to check against.

It's taking a lot longer than I thought, even when there is nothing to update and it just scans over the records, it still takes 5 minutes & even longer when there is something to update, tried with and without indexing, joining the 2 fields in slightly different ways [both with just a WHERE and with actual JOIN]. just want to know if there is a better way of doing it or if I'm doing it just plain wrong, only been using Postgres for few days.

I know 5 minutes is no big deal [longer if it performs any updates], but its a similar process for about 9 other tables & this is a mid size example

both tables look like the below [but with different table names only]

CREATE TABLE public."Cases"
(
  "Past_Due__c" character varying(255),
  "Case_Age__c" character varying(255),
  "Next_Step_Due_Date__c" character varying(255),
  "Id" character varying(255),
  "AccountId" character varying(255),
  "Account_Number__c" character varying(255),
  "Account_Type__c" character varying(255),
  "CaseNumber" character varying(255),
  "CaseSubTypeDetail__c" character varying(255),
  "Case_Sub_Type__c" character varying(255),
  "Case_Type__c" character varying(255),
  "ClosedDate" character varying(255),
  "Collections_Step__c" character varying(255),
  "Customer_Number__c" character varying(255),
  "Next_Collections_Step__c" character varying(255),
  "Origin" character varying(255),
  "Priority" character varying(255),
  "Related_Complaint_Case__c" character varying(255),
  "Status__c" character varying(255),
  "Subject" text,
  "Type" character varying(255),
  "CreatedDate" character varying(255),
  "OwnerId" character varying(255),
  "ContactId" character varying(255),
  "Status" character varying(255),
  "Case_Comments__c" text,
  "Subscription__c" character varying(255),
  "Description" text,
  "Case_Outcome__c" text,
  "Case_Outcome_Reason__c" text,
  "Adjustment_Amount__c" character varying(255),
  "Product_Adjustment_Amount__c" character varying(255),
  "Product_Adjustment_Reason__c" character varying(255),
  "Service__c" character varying(255),
  "ParentId" character varying(255)
)
WITH (
  OIDS=FALSE
);

The update script is below

update public."cases" t2
set past_due__c = t1.past_due__c, case_age__c = t1.case_age__c, next_step_due_date__c = t1.next_step_due_date__c, accountid = t1.accountid, account_number__c = t1.account_number__c, account_type__c = t1.account_type__c, casesubtypedetail__c = t1.casesubtypedetail__c, case_sub_type__c = t1.case_sub_type__c, case_type__c = t1.case_type__c, closeddate = t1.closeddate, collections_step__c = t1.collections_step__c, customer_number__c = t1.customer_number__c, next_collections_step__c = t1.next_collections_step__c, origin = t1.origin, priority = t1.priority, related_complaint_case__c = t1.related_complaint_case__c, status__c = t1.status__c, subject = t1.subject, type = t1.type, ownerid = t1.ownerid, contactid = t1.contactid, status = t1.status, case_comments__c = t1.case_comments__c, subscription__c = t1.subscription__c, description = t1.description, case_outcome__c = t1.case_outcome__c, case_outcome_reason__c = t1.case_outcome_reason__c, adjustment_amount__c = t1.adjustment_amount__c, product_adjustment_amount__c = t1.product_adjustment_amount__c, product_adjustment_reason__c = t1.product_adjustment_reason__c, service__c = t1.service__c, parentid = t1.parentid, billing_account__c = t1.billing_account__c, billing_account_credit_balance__c = t1.billing_account_credit_balance__c, billing_address__c = t1.billing_address__c, lastmodifiedbyid = t1.lastmodifiedbyid, lastmodifieddate = t1.lastmodifieddate
from   public."temp_update_cases" t1
where  t1.id = t2.id

Everything else I need to do I figured out but this one is killing me

4
  • also note that i havent touched any default settings, and i vacuum before running the update Commented Sep 7, 2016 at 9:41
  • Unrelated, but: you should really avoid quoted identifiers. Commented Sep 7, 2016 at 9:42
  • Please edit your question and add the execution plan generated using explain (analyze, verbose). Formatted text please, no screen shots. See also wiki.postgresql.org/wiki/SlowQueryQuestions for more information on how to ask performance related questions. Commented Sep 7, 2016 at 9:44
  • "Id" character varying(255),doesn't look like a good idea (even with a PK constraind or an index on it). BTW: this looks more like a spreadsheet than a database table ... Commented Sep 7, 2016 at 9:45

1 Answer 1

1

Your query is basically this:

update public."cases" t2
set  . . .
from   public."temp_update_cases" t1
where  t1.id = t2.id;

I would suggest indexes:

create index idx_cases_id on public."cases"(id);
create index idx_temp_updte_cases_id on public."temp_update_cases"(id);

Notes:

  • I agree with the comments that the quoted identifiers are not a good idea.
  • Your tables should have some sort of primary key. A column called id is a good candidate.
  • Often, serial is a better option for the primary key than a character string.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.