What is the fastest way to fetch data from two tables if one table is referenced from another one in multiple columns?
Consider a table with company names and a table with contracts. Each contract can have a client, an intermediary, and a contractor - in every combination. Each value may be null and the same company may be one, two, or three times in each contract row.
The table definitions are:
CREATE TABLE company (id integer,name text);
CREATE TABLE contract (id integer, client integer, intermediary integer, contractor integer);
I've created a SQL fiddle with the test da below: https://www.db-fiddle.com/f/irCodeZjeEPWvhmRwMcHqT/0
Test data:
INSERT INTO company (id,name) VAlUES (1,'Company 1');
INSERT INTO company (id,name) VAlUES (2,'Company 2');
INSERT INTO company (id,name) VAlUES (3,'Company 3');
INSERT INTO company (id,name) VAlUES (4,'Company 4');
INSERT INTO company (id,name) VAlUES (5,'Company 5');
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (1,NULL,NULL,NULL);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (2,NULL,2,3);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (3,1,NULL,NULL);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (4,NULL,2,NULL);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (5,1,2,3);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (6,4,NULL,5);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (7,1,NULL,1);
INSERT INTO contract (id,client,intermediary,contractor) VAlUES (7,3,3,3);
Now, using PostgreSQL 9.6, a query is needed which returns the contract id with the name of each company involved. Pretty easy with subqueries:
SELECT
id,
(SELECT name FROM company WHERE id = client) AS "clientName",
(SELECT name FROM company WHERE id = intermediary) AS "intermediaryName",
(SELECT name FROM company WHERE id = contractor) AS "contractorName"
FROM contract;
However, in real world, with a much more complex query, we are getting into performance problems here. The question is now: Is there a way to improve it? Would a JOIN be faster than subqueries? If yes: How would that even work?
Of course, you could do something like
SELECT * FROM contract LEFT JOIN company ON company.id = ANY(ARRAY[client,contractor,intermediary]);,
but in this case, the information which company plays which role in the contract gets lost.
(Edit: In real world, there are indexes, foreign key constraints and stuff. I've left all that aside here for brevity.)