0

I'm writing a query designed to check records that exist in 6 different tables for a specific value and, if it's null, indicate that the value is missing (meaning NULL).

However, what I'm struggling with is in the case that the record is missing more than one of the values being checked. Since there is a chance that any amount up to the max of the 5 checked values is missing, I need to be able to output the appropriate amount of flags based on each record and what it's missing. Everything I've seen around can get close in various ways but either feels counterintuitive, clunky, or more complex than it feels like it should be.

Here's a baseline for what I've got so far:

-- INIT database
CREATE TABLE Table0 ([UniqID] INT);

CREATE TABLE Table1 ([UniqID] INT, [Value1] VARCHAR(50));
 
CREATE TABLE Table2 ([UniqID] INT, [Value2] VARCHAR(50));

CREATE TABLE Table3 ([UniqID] INT, [Value3] VARCHAR(50));

CREATE TABLE Table4 ([UniqID] INT, [Value4] VARCHAR(50));

CREATE TABLE Table5 ([UniqID] INT, [Value5] VARCHAR(50));

-- ADD data
INSERT INTO Table0([UniqID]) VALUES ('1001');
INSERT INTO Table0([UniqID]) VALUES ('1002');
INSERT INTO Table0([UniqID]) VALUES ('1003');
INSERT INTO Table0([UniqID]) VALUES ('1004');

INSERT INTO Table1([UniqID], [Value1]) VALUES ('1001', NULL);
INSERT INTO Table1([UniqID], [Value1]) VALUES ('1002', NULL);
INSERT INTO Table1([UniqID], [Value1]) VALUES ('1003', 'Record3');
INSERT INTO Table1([UniqID], [Value1]) VALUES ('1004', NULL);

INSERT INTO Table2([UniqID], [Value2]) VALUES ('1001', NULL);
INSERT INTO Table2([UniqID], [Value2]) VALUES ('1002', 'Record2');
INSERT INTO Table2([UniqID], [Value2]) VALUES ('1003', 'Record3');
INSERT INTO Table2([UniqID], [Value2]) VALUES ('1004', 'Record4');

INSERT INTO Table3([UniqID], [Value3]) VALUES ('1001', NULL);
INSERT INTO Table3([UniqID], [Value3]) VALUES ('1002', 'Record2');
INSERT INTO Table3([UniqID], [Value3]) VALUES ('1003', 'Record3');
INSERT INTO Table3([UniqID], [Value3]) VALUES ('1004', 'Record4');

INSERT INTO Table4([UniqID], [Value4]) VALUES ('1001', 'Record1');
INSERT INTO Table4([UniqID], [Value4]) VALUES ('1002', NULL);
INSERT INTO Table4([UniqID], [Value4]) VALUES ('1003', 'Record3');
INSERT INTO Table4([UniqID], [Value4]) VALUES ('1004', NULL);

INSERT INTO Table5([UniqID], [Value5]) VALUES ('1001', 'Record1');
INSERT INTO Table5([UniqID], [Value5]) VALUES ('1002', 'Record2');
INSERT INTO Table5([UniqID], [Value5]) VALUES ('1003', NULL);
INSERT INTO Table5([UniqID], [Value5]) VALUES ('1004', NULL);

-- Main query
SELECT 
    Table0.[UniqID],
    STRING_AGG(CASE WHEN Table1.[Value1] IS NULL THEN 'Flag 1' ELSE '' END
               CASE WHEN Table2.[Value2] IS NULL THEN 'Flag 2' ELSE '' END
               CASE WHEN Table3.[Value3] IS NULL THEN 'Flag 3' ELSE '' END
               CASE WHEN Table4.[Value4] IS NULL THEN 'Flag 4' ELSE '' END
               CASE WHEN Table5.[Value5] IS NULL THEN 'Flag 5' ELSE '' END, ', ') AS 'Reason(s) for Inclusion'
FROM 
    Table0
 
-- JOIN section
LEFT JOIN 
    Table1 ON Table0.[UniqID] = Table1.[UniqID]
LEFT JOIN 
    Table2 ON Table0.[UniqID] = Table2.[UniqID]
LEFT JOIN 
    Table3 ON Table0.[UniqID] = Table3.[UniqID]
LEFT JOIN 
    Table4 ON Table0.[UniqID] = Table4.[UniqID]
LEFT JOIN 
    Table5 ON Table0.[UniqID] = Table5.[UniqID]
 
-- Filtering to pull where NULLs may exist
WHERE 
    Table1.[Value1] IS NULL
    OR Table2.[Value2] IS NULL
    OR Table3.[Value3] IS NULL
    OR Table4.[Value4] IS NULL
    OR Table5.[Value5] IS NULL

The desired output should be something like this:

UniqID  Reason(s) for Inclusion
--------------------------------
1001    Flag 1, Flag 2, Flag3
1002    Flag 1, Flag 4
1003    Flag 5
1004    Flag 1, Flag 4, Flag 5

I've tried using CASE expressions on their own, both in and outside of a STRING_AGG(), trying to use a CONCAT() with several independent CASE expressions, simply tying the results together like below using both independent CASE expressions and ISNULL():

ISNULL(Table1.[Value1],Flag1) + ', ' + ISNULL(Table2.[Value2],Flag2)...

If there's more info I can provide, I'm happy to try as a Stack Overflow newbie!

5
  • 2
    You need to edit your post to add a tag for the specific RDBMS you're using, as syntax and functionality varies widely between them. You can remove the evaluation and output tags, as they add nothing useful to your post. Commented Jan 30 at 2:30
  • 1
    The question seems interesting. Please add a few rows of sample data and also the expected result for it. Commented Jan 30 at 2:37
  • Hopefully this edited query helps. I've added some more including the current state of solution (which I know won't run but that's where I've left off with what I've been trying). If there's a better format, can someone link me an example so I can try to better lay this out for you all? Commented Jan 30 at 4:21
  • Since you haven't explicitly included NOT NULL constraints on the value columns are you looking specifically for when they store NULL values, or are you wanting to flag them when the entire row is absent? Both? Commented Jan 30 at 4:34
  • The tables in question may have NULL values so my query is intended to identify the records with NULL values and where they might be so we can correct them. Commented Jan 30 at 4:44

1 Answer 1

1

You're almost there. I just tweaked the way you determine the 'Reasons for inclusion' column.

Instead of the STRING_AGG, I just used + to combine the strings.

Note also that each value starts with a comma + space e.g., , Flag 1. This means a comma + space will only be added if we're also adding a 'Flag'; however, it also means the resulting string will always start with comma + space. Therefore I use the STUFF function to get rid of those two characters.

SELECT Table0.[UniqID],
  STUFF(
  CASE WHEN Table1.[Value1] IS NULL THEN ', Flag 1' ELSE '' END
  + CASE WHEN Table2.[Value2] IS NULL THEN ', Flag 2' ELSE '' END
  + CASE WHEN Table3.[Value3] IS NULL THEN ', Flag 3' ELSE '' END
  + CASE WHEN Table4.[Value4] IS NULL THEN ', Flag 4' ELSE '' END
  + CASE WHEN Table5.[Value5] IS NULL THEN ', Flag 5' ELSE '' END,
  1, 2, '')
 AS 'Reason(s) for Inclusion'

Results

UniqID  Reason(s) for Inclusion
1001    Flag 1, Flag 2, Flag 3
1002    Flag 1, Flag 4
1003    Flag 5
1004    Flag 1, Flag 4, Flag 5
Sign up to request clarification or add additional context in comments.

2 Comments

This works great! A question for my education though, would this be considered viable at scale? For example, if the amount of checks was way more than 5, is this still the only way to achieve the same results or is there another route to getting this result if 20+ flags were needed?
Given the data structure, I believe that this is as viable as any other method. Why? You will need to read the data from all the tables at least once to check if it exists - therefore the joins etc are as good as other method. As they are all separate tables, you will need to reference all tables and fields at some point. It will be helped by having indexes (e.g., primary key/clustered index) on UniqId for all tables. Be wary - if any table has two or more rows for any UniqID, you will have repeated rows in the output (and will need to tweak the approach to accommodate this.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.