0

Looking for the most efficient way to check for nulls and have a desired output for a report. This is done in a Hadoop environment.

For example, Database contains:

FirstName LastName State
John {null} OH
Jane Smith {null}
Pikachu Pokemon HI

The output from the query would be:

RuleNum ColName DefectCount TableCount DateofData
Rule001 FirstName 0 50 2023-01-01
Rule002 LastName 1 50 2023-01-01
Rule003 State 1 50 2023-01-01

The embarassingly clunky and inefficient query that that I use, which works- but it takes a LONG time to get results is(and we check thousands of columns)- and i'm unsure as to how to get the table count included efficiently:

SELECT
CAST('Rule001' AS VARCHAR(255)) AS RuleNum,
CAST('FirstName' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE FirstName IS NULL

UNION ALL

SELECT
CAST('Rule002' AS VARCHAR(255)) AS RuleNum,
CAST('LastName' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE LastName IS NULL

UNION ALL

SELECT
CAST('Rule003' AS VARCHAR(255)) AS RuleNum,
CAST('State' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE State IS NULL
8
  • 2
    I've removed the conflictin (R)DBMS tags. You will need to edit your question to (re)tag the product you are actually using. Commented Jan 29, 2023 at 16:22
  • for the output you wanr there isn't a better way, but why not have all columns defined as NIOT NULL, so that you don't need to count Commented Jan 29, 2023 at 16:40
  • Yeah, I assumed it was SQL Server until CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData. So not so sure anymore. Commented Jan 29, 2023 at 16:54
  • I would build a dynamic query to get a row of null counts for nullable columns of the table and unpivot it. Commented Jan 29, 2023 at 17:53
  • The date field isn't as important- the syntax can be adjusted... or even put in at a later time in Excel or another BI tool that we're researching(Alteryx). @Serg any tips or resources as to where I may find such a dynamic query, please? Thanks! Commented Jan 30, 2023 at 15:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.