2

I need some help handling null columns in joins. I have three tables (below are just examples) Map, Employee and Region. Primary table is Map where I join Employee table on the MapID column to get employeeID and Employee table is used to join with Region table on employeeID.

Requirement: for any given mapID, check if there are any region for EmployeeID. If there is a match, pick it. Only if EmployeeID is null, then check for zip5 and pick the region.

I tried the code shown here; for mapID = 7890, it returns region for both EmployeeID Null and 200. For this mapid, there exists employeeID = 200, and only this value should be returned, ignoring the other.

For mapID = 4567, employeeID is null and correct value is being picked.

Can anyone tell me what my mistake is?

Expected output:

mapid Prodcode proddesc amount zip5 region
7890 23458 POT 1234789 45678 West S San Diago

Actual output:

mapid Prodcode proddesc amount zip5 region
7890 23458 POT 1234789 45678 West S San Diago
45678 7890 23458 POT 1234789 West So. CA
DROP TABLE IF EXISTS #map

CREATE TABLE #map
(
    zip5 varchar(10),
    mapid varchar(10),
    Prodcode int,
    proddesc varchar(10),
    amount float
)

DROP TABLE IF EXISTS #employee

CREATE TABLE #employee
(
    EmployeeID varchar(10),
    mapid varchar(10)
)

DROP TABLE IF EXISTS #Region

CREATE TABLE #Region
(
    mapid varchar(10),
    EmployeeID varchar(10),
    zip5 varchar(10),
    Region varchar(50)
)

INSERT INTO #map
    SELECT '04987', '9879', 24567, 'ISC', '17645.00'
    UNION
    SELECT '45678', '7890', 23458, 'POT', '1234789.00'
    UNION
    SELECT '56333', '5678', 24567, 'MHT', '23400.00'
    UNION
    SELECT '00899', '4567', 24567, 'PIT', '1234.00'
    UNION
    SELECT '00899', '3457', 24567, 'ISC', '17645.00'

INSERT INTO #employee
    SELECT '100', '9879'
    UNION
    SELECT '200', '7890'
    UNION
    SELECT '400', '5678'
    UNION
    SELECT NULL, '4567'
    UNION
    SELECT '500', '3457'
 

INSERT INTO #Region
    SELECT '9879', '100', '04987', 'South'
    UNION
    SELECT '7890', '200', '45678', 'West S San Diago'
    UNION
    SELECT '7890', NULL, '45678', 'West So. CA'
    UNION
    SELECT '5678', '400', '56333', 'EastCentral'
    UNION
    SELECT '4567', NULL, '00899', 'south'

SELECT * FROM #employee WHERE mapid = '7890'

SELECT * FROM #map WHERE mapid = '7890'

SELECT * FROM #Region WHERE mapid = '7890'

SELECT DISTINCT
    m.*, region 
FROM
    #map m
LEFT JOIN
    #Region r ON r.mapid = m.mapid
LEFT JOIN
    #employee e ON e.mapid = m.mapid
                AND r.employeeid = ISNULL(e.employeeid, '') 
                 OR (r.employeeid <> ISNULL(e.employeeid, '') 
                     AND r.zip5 = m.zip5)
WHERE
    m.mapid = '7890'

(db<>fiddle with the whole case)

6
  • I think you need some extra braces here (just after the first and): left join #employee e on e.mapid=m.mapid and (r.employeeid=isnull(e.employeeid,'') OR (r.employeeid<>isnull(e.employeeid,'') and r.zip5=m.zip5)) so that the e.mapid=m.mapid condition is always taken into account, regardless of everything that follows. Commented Oct 29 at 15:55
  • It feels like this is being queried in the wrong order. Are you trying to find all employees with map 7890, and then find their regions? Thus one row per employee? Commented Oct 29 at 16:28
  • This matches my understanding, but I find your explanation a bit confusing ; dbfiddle.uk/5b74HBbC Commented Oct 29 at 16:46
  • 1
    I suggest expanding you minimal reproducible example to demonstrate all the behaviours you need (not with final code, just source data and required output, with explanations for each output row). Commented Oct 29 at 16:47
  • 1
    "for any given mapID, check if there are any region for EmployeeID" is very hard to understand. The map table does not contain an employee ID, so what EmployeeID are you refering to? Do you mean the employee row(s) that have the same map ID? Can there be only one or many? What if no employee at all has that map ID? And if it's possible to get many employees, does it suffice to find a region for at least one of them? I'm lost. Commented Oct 29 at 18:33

2 Answers 2

1

Testing for NULL in the WHERE of an anti-JOIN

You have put your condition in the JOIN … ON instead of in the WHERE.
The rule for an anti-JOIN is that you test your possibly NULL column (from the LEFT JOIN) in the WHERE.

In a way, instead of asking "join with employee, then return rows only if there is an employee match, or there is no emp match but a ZIP one",
you're asking "join if there is an employee match, or there is no emp match but a ZIP one, then return those rows without any other condition".

Thus simply changing your:

on e.mapid=m.mapid
and r.employeeid=isnull(e.employeeid,'') OR (r.employeeid<>isnull(e.employeeid,'') and r.zip5=m.zip5)
where m.mapid='7890'

to:

on e.mapid=m.mapid
where (r.employeeid=isnull(e.employeeid,'') OR (r.employeeid<>isnull(e.employeeid,'') and r.zip5=m.zip5))
and m.mapid='7890'

returns the expected result:

zip5 mapid Prodcode proddesc amount region
45678 7890 23458 POT 1234789 West S San Diago

(second query in this db<>fiddle)

In case of need

In SQL, (condition1 AND condition2) OR ((NOT condition1) AND condition3)
can be written CASE WHEN condition1 THEN condition2 ELSE condition3 END
(avoiding the repetition of, and possibly typo on, condition1, or miss of a case in more complex conditions).

There's just a small problem: SQL Server doesn't make comparisons return a boolean type, so we cannot write:
where case when e.employeeid is not null then r.employeeid=e.employeeid else r.zip5=m.zip5 end

Of course we could use a where 1 = case when /* something returning 1 in case of a match */ end,
but why not being creative by returning something that matches the first condition?

where r.employeeid = coalesce(e.employeeid, case when r.zip5=m.zip5 then r.employeeid end)

This will read (thanks to the coalesce too):

  • if e.employeeid is not null, then the results of r.employeeid = e.employeeid
  • else if e.employeeid is null, the results of r.employeeid = case when r.zip5=m.zip5 then r.employeeid end
    which will match if r.zip5=m.zip5 (/!\ given that r.employeeid is never null)

(third query of the db<>fiddle)

Apples and bananas

However, as both MatBailie and you point out, a problem remains. It all comes down to one thing: do not try to mix different realities in the same table alias, that is, do not try to have r represent both the region of the employee, and the region for the ZIP of the map.

Although you have an intermediate employee, this is essentially the same as a self-join, for example when picking an employee and its employer in a persons table, or apples and bananas from fruits if you want to compare apples to bananas (their number for example). In those cases, once you have "specialized" a row to a given subrole that you want to compare to other subroles of the same set, it should not try to play both roles in the same columns. Better have one row with columns coming from bananas and columns coming from apples, than one row for apples, one row for bananas: it is easier to compare columns than rows.

(well, it's not the only way: in fact you could succeed in having both regions in the same row, by left join #Region r on r.mapid=m.mapid or r.employeeid=e.employeeid (do not choose, keep all possibilities) then with window functions telling you if there's another region linked to the employee in the same set (= linked to the same map); but would it be worth the effort?)

So let's add an re alias for the employee's associated region.

As your requirement is "an employee necessarily with a region", I would avoid chaining left join #Employee e left join #Region re, which could return employees without a region, while the more formal left join (#employee e join #Region re) effectively discards employees without a matching region.
In your case as you are only interested in columns from re, not from e, the end result will be the same (the columns of re will be null if there's no employee, or if there's an employee without a region), but to take good habits I'll use the second form that ensures an all-or-nothing for columns of e too.

Finally, as we now have two different Regions, our switching logic is not in the join anymore, but in the select of the final columns, where we can use a simple coalesce.

select distinct
  m.*,
  coalesce(re.region, r.region) as region
from #map m
left join #Region r
on r.mapid=m.mapid and r.zip5=m.zip5
left join (#employee e join #Region re on re.employeeid=e.employeeid)
on e.mapid=m.mapid
zip5 mapid Prodcode proddesc amount region
00899 3457 24567 ISC 17645 null
00899 4567 24567 PIT 1234 south
04987 9879 24567 ISC 17645 South
45678 7890 23458 POT 1234789 West S San Diago
56333 5678 24567 MHT 23400 EastCentral

(note the null for region 3457: it has neither an employee, nor a matching ZIP in its mapid: I think this is what you expect)

(very last query of this reaugmented fiddle)

Sign up to request clarification or add additional context in comments.

3 Comments

Doesn't give results for all of the other map entries... dbfiddle.uk/pAkTwTCg Where as the following gives one result per map entry, but the OP isn't clear enough to me for it to be an answer; dbfiddle.uk/szmvs6hC
Thanks! I took time to re-read the question, and added an "Apples and bananas" paragraph to my answer, with a query that I think integrates all of unicorn's specs. Note that contrary to your query it returns null for mapid 3457, because the only matching zip5 is not on the same mapid. Simply adding an AND r2.mapid = m.mapid to your last JOIN's condition allows our queries to match, though (dbfiddle.uk/F_UL2uSe).
where r.employeeid = coalesce(e.employeeid, case when r.zip5=m.zip5 then r.employeeid end) works for this case but this wont work for mapid 4567 which has employeeID null but has valid zip5
0

And what if you just do:

select m.mapid, m.prodcode, m.proddesc, m.amount, m.zip5, 
    coalesce(re.region, r.region) as region 
from #map m
left join #employee e 
join #region re on re.EmployeeID = e.EmployeeID
    on e.mapid = m.mapid
left join #region r on r.zip5 = m.zip5 and e.EmployeeID is null

7 Comments

this worked very well. thank you very much
While arriving to sensibly the same result in the rework of my answer, I note that our results differ on 3457: I return null while this answer returns 'south'. Could you precise which solution is expected for this case where there exists a matching zip5, but on another mapid? (N.B.: this is a minor difference, that only depends on adding condition and m.mapid = r.mapid to the last join, as can be seen in this fiddle adapted from p3consulting's one. Still it would be great to have all solutions consistent with the question).
this is a good catch.. 3457 doesnt exist in Region , so i expect NULL . this can be fixed with LEFT JOIN with Region instead of join #region re
A good answer contains a text explanation in addition to a hopefully working answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.