I have 1 big table with 9 fields/columns, some of theses columns have a lot of duplicates, so i decided to split them into 2 tables in Postgresql i.e. TEST_A table with columns that have a lot of duplicates & TEST_B table which has unique columns.
I have no issues writing data into TEST_A table, but when i try to write into TEST_B table, Postgresql throws :
insert or update on table "TEST_B" violates foreign key constraint "TEST_A_id_fkey"
DETAIL: Key (session_id)=(XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX) is not present in table "TEST_A".
CREATE TABLE TEST_A(
id VARCHAR ( 100 ) ,
column1 VARCHAR ( 100 ) ,
column2 VARCHAR ( 100 ) ,
column3 VARCHAR ( 100 ) ,
column4 VARCHAR ( 100 ) NOT NULL,
column5 VARCHAR ( 100 ) NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE TEST_B(
m_id VARCHAR ( 100 ) ,
s_id VARCHAR ( 100 ) ,
column11 TIMESTAMP,
column12 TEXT,
column13 TEXT,
column14 TEXT,
PRIMARY KEY(m_id),
FOREIGN KEY(s_id) REFERENCES TEST_A(id) ON DELETE CASCADE ON UPDATE CASCADE
);
I removed the duplicate rows in the Table TEST_A, TEST_A initially had millions of rows, now it only has a few 1000's.
When i try to write to TEST_B i.e. PostgreSQL DB, it will not allow me to write because it is not able to find the unique "id" in the TEST_A table, because it got deleted when we removed the duplicates in Table TEST_A. But unfortunately TEST_B table still has the old "id" still attached to the row.
TEST_B "m_id" needs to track which TEST_A(id) they were a part of. It represents One to Many Relationship (1:M) i.e. Every TEST_B.m_id should have one TEST_A.id & every TEST_A.id should have multiple TEST_B.m_id's.
I did some reading into Postgres DB documentation and I have used DEFERRABLE INITIALLY DEFERRED, it did not work.
CREATE TABLE TEST_A1(
id VARCHAR ( 100 ) ,
column1 VARCHAR ( 100 ) ,
column2 VARCHAR ( 100 ) ,
column3 VARCHAR ( 100 ) ,
column4 VARCHAR ( 100 ) NOT NULL,
column5 VARCHAR ( 100 ) NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE TEST_B1(
m_id VARCHAR ( 100 ) ,
s_id VARCHAR ( 100 ) NOT NULL,
column11 TIMESTAMP,
column12 TEXT,
column13 TEXT,
column14 TEXT,
PRIMARY KEY(m_id),
FOREIGN KEY(s_id) REFERENCES TEST_A1(id) DEFERRABLE INITIALLY DEFERRED
);
Insert query is the same :-
INSERT INTO TEST_B1(m_id, s_id, column11, column12, column13, column14) VALUES ('1234', '2345', NULL, 'SAMPLE_VALUE_1', 'SAMPLE_VALUE_2', 'SAMPLE_VALUE_3');
Result :
ERROR: insert or update on table "test_b1" violates foreign key constraint "test_b1_s_id_fkey"
DETAIL: Key (s_id)=(2345) is not present in table "test_a1".
SQL state: 23503
I'm currently using PySpark with SQLAlchemy to write to Postgresql DB.
Any newbie suggestions/advice is appreciated. Thanks.
Update : Added a sample INSERT QUERY =>
INSERT INTO TEST_B(m_id, s_id, column11, column12, column13, column14) VALUES ('1234', '2345', NULL, 'SAMPLE_VALUE_1', 'SAMPLE_VALUE_2', 'SAMPLE_VALUE_3');
Postgresql DB throws the following error on inserting the above mentioned row :-
ERROR: insert or update on table "test_b" violates foreign key constraint "TEST_B_s_id_fkey"
DETAIL: Key (s_id)=(2345) is not present in table "TEST_A".
SQL state: 23503
s_id is the foreign key linked to id field of TEST_A table, so when (2345) is not present in table "TEST_A", it will throw foreign key constraint error.
Any suggestions/advice is appreciated.