Avoid nested loop in PostgreSQL

Question

See query below

Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN 
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3
ON a2.TeacherID=a3.TeacherID

The subqueries must be used for deduping purpose.

This query run fine in PostgreSQL. However, if I add a condition between the student and teacher table, according to the execution plan, Postgres will wrongly nested loop join the student and teach tables which have no direct relationship. For example:

Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN 
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3 ON
 a2.TeacherID=a3.TeacherID
WHERE Teacher_Age>=Student_Age

This query will take forever to run. However, if I replace the subqueries with the tables, it'll run very fast. Without using temp tables to store the deduping result, is there a way to to avoid the nested loop in this situation?

Thank you for your help.

Why do you want to join the inline query when you can join the table itself. Consider, doing a LEFT JOIN instead and put the condition in join clause instead WHERE. — Rahul
– Rahul, Commented Oct 7, 2014 at 19:09
The subqueries must be used for deduplication. We have a lot of that in our dataset. Also each table above contains about 3M records. — toanong
– toanong, Commented Oct 7, 2014 at 19:11
If you have a lot of duplication in the student, teachers and courses tables, it sounds like a flaw in your schema. The unique identifying attributes should be in one table and whatever data is related to them and causing the duplicates when you select that identifying data should be in one or more other tables. — gwaigh
– gwaigh, Commented Oct 8, 2014 at 3:59
@gwaigh: I can't do anything to change the schema or the data. These tables are the results of multi-sites data integration. At each site, each student is unique. However, since data about one same student can be stored at multiple sites when the data are integrated, duplication occurred. — toanong
– toanong, Commented Oct 8, 2014 at 11:42

Andrei Volgin · Accepted Answer · 2019-11-01 15:05:55Z

2

You're making the database perform a lot of unnecessary work to accomplish your goal. Instead of doing 3 different SELECT DISTINCT sub-queries all joined together, try joining the base tables directly to each other and let it handle the DISTINCT part only once. If your tables have proper indexes on the ID fields, this should run rather quick.

SELECT COUNT(1)
    FROM (
    SELECT DISTINCT s.Student_ID, c.CourseID, t.TeacherID
        FROM student s
        JOIN courses c ON s.CourseID = c.CourseID
        JOIN teachers t ON c.TeacherID = t.TeacherID
        WHERE t.Teacher_Age >= s.StudentAge
     ) a

edited Nov 1, 2019 at 15:05

Andrei Volgin

41.1k6 gold badges50 silver badges59 bronze badges

answered Feb 7, 2015 at 14:02

RustProof Labs

1,29810 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Avoid nested loop in PostgreSQL

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related