SQL JOIN to the same table explanation [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed last year.

Improve this question

I am refactoring some code that was written by another person that is no longer around. In this query a MAX(Date) is being selected, but then joined to they same table via MAX(Locator). The locator is an arbitrary sequential number that is assigned when a record is created, so essentially the MAX(Locator) would return the most recent record the same as MAX(Date). Is there any purpose to this seeming redundancy?

SELECT DISTINCT
    MAX(T.USERDATE2) AS BillPayDate
INTO
    #PersonBillPayAccounts
FROM 
    [TRACKING] T
JOIN
    (SELECT ACCOUNT_NUMBER, MAX(Locator) AS Locator
     FROM [TRACKING]
     WHERE Type = 32
       AND (userdate2 IS NOT NULL OR userdate2 != '') 
     GROUP BY ACCOUNT_NUMBER) L ON T.ACCOUNT_NUMBER = L.ACCOUNT_NUMBER 
                                AND T.LOCATOR = L.Locator 
                                AND T.Type = 32

Why not rewrite it and see what happens? To me, the whole thing looks really bad — siggemannen
– siggemannen, Commented Jul 25, 2024 at 18:44
and what is the functionality of T.Type = 32 in all of this ?? — Luuk
– Luuk, Commented Jul 25, 2024 at 18:55
To me the 2nd join to tracking is doing something. It's ensuring only max userdate2's returned are for those which have a max locator value by account and for only those where userdate2 is populated. So if someone were to update date field to set it to an older date (not max for that account) then you'd get a date not matching the max(locator)'s date. By joining you guarantee you get the max date for the max locator. I'd prefer the whole 2nd join be written as a correlated subquery using EXISTS so it can early escape and I like business limits in where clauses. — xQbert
– xQbert, Commented Jul 25, 2024 at 19:05
If you run SELECT MAX(LOCATOR) maxLOC, max(Userdate2) maxBillPayDate, Account_number FROM TRACKING WHERE Type = 32 and userdate2 is not null or userdate2 !='' GROUP BY Account_number EXCEPT SELECT LOCATOR, Userdate2, Account_number FROM TRACKING WHERE Type = 32 and userdate2 is not null or userdate2 !='' and get any records back, then you have records whose bill date doesn't align with the max locator. First query gets max date bill pay and by account. Second query returns each record as is so result will show if you made a bad assumption. — xQbert
– xQbert, Commented Jul 25, 2024 at 19:15
Are there any unique constraints on the table? A primary key or a unique index? What data type is userdate2? Why is this date treated as a string in the query? Which columns are nullable, which aren't? Can you show the table description? Create table? — Thorsten Kettner
– Thorsten Kettner, Commented Jul 25, 2024 at 23:13

Joel Coehoorn · Accepted Answer · 2024-07-25 19:58:11Z

2

Say you have data like this:

Account_Number	Locator	UserDate2	Type
1	A	20240725	32
1	B	20240724	32
2	A	20240721	32
2	B	20240722	32

The JOIN forces you to only use dates from the greater locator within each account. So you end up using July 24 instead of July 25 from account 1, and July 22 instead of July 21 from account 2 (but this is removed by the MAX() aggregation).

But that's the old way. The newer way uses Window Functions, and should perform much better:

WITH Dates As (
    SELECT
        FIRST_VALUE(UserDate2) OVER (PARTITION BY Account_Number ORDER BY Locator DESC) BillPayDate
    FROM Tracking
    WHERE UserDate2 IS NOT NULL AND Type=32
)
SELECT Max(BillPayDate) BillPayDate
FROM Dates

See it work here:

https://dbfiddle.uk/XqR-FhH-

By the way, the userdate2 != '' excerpt tells us this schema is broken. It is not okay to use varchar columns to store dates. Also, the DISTINCT was meaningless in the context of a MAX() aggregation. Finally, I strongly doubt a temp table was a good use of this result.

edited Jul 25, 2024 at 19:58

answered Jul 25, 2024 at 19:26

Joel Coehoorn

418k114 gold badges582 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

xQbert Over a year ago

I at first thought the distinct was meaningless too; but then the inline view is grouping by account number with the max(locator) so there will be multiple records in the result so the max bill pay date from each account; which could then have multiple same bill pay dates. So it could reduce the "Dates"; so it will reduce the count in the results since there is no group by on the date in teh outer query.

Joel Coehoorn Over a year ago

@xQbert There is only one record in the results with our without the DISTINCT. dbfiddle wasn't responding, but you can see it at this link: sqlfiddle.com/sql-server/… If the GROUP BY were on the whole query instead of just the derived table, we might see multiple rows.

Collectives™ on Stack Overflow

SQL JOIN to the same table explanation [closed]

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related