0

I’m working on a data quality workflow where I validate incoming records for null or missing values.
Even when a column clearly contains nulls, my rule doesn’t trigger and the record passes validation.

Here’s the logic I’m using:

CASE   
   WHEN column_name IS NULL THEN 'FAIL'  
   ELSE 'PASS'  
END

But NULL records still return PASS.

Things I’ve checked:

  • The column datatype is VARCHAR

  • The source file is CSV

  • Some values look empty ("") but not sure if they are treated as NULL

  • My question:

    Is there a difference between empty string and NULL in SQL during validation? If so, how can I reliably detect actual NULL vs whitespace vs empty string?

3
  • 1
    This is a problem in almost all RDBMS (Oracle being the exception). Unlike dates and numbers that are either given or null, strings can be given or null or empty. Consider whether you want to be able to distinguish null and empty string and what the difference would even mean. I have actually never seen a need for the differentiation. So get rid of one or the other. To get rid of nulls, add a not null constraint. To get rid of the empty string add a check constraint or a trigger that converts the empty string into null. Keeping nulls is preferable IMHO for they work nicely in aggregations. Commented Nov 17 at 13:25
  • Note that tables have rows, not records. Commented Nov 17 at 13:32
  • case when column_name <> '' then 'PASS' else 'FAIL' end, will return FAIL for both empty strings and null values. Commented Nov 17 at 13:33

1 Answer 1

1

The correct code would be

CASE
    WHEN column_name IS NULL OR column_name = '' THEN 'FAIL'
    ELSE 'PASS'
END

The reason is that null is different from empty string. Null means "unknown" whilst an empty string is an actually known value which happens to be an empty string. You could also have whitespace-only strings in your input that will pass, but if you want them not to pass, then you'll need some regex in use there, but given the differences on how to use them in different RDBMS, we would need you to tell us what RDBMS you are using if you want that too.

EDIT

As Jonas Metzler pointed out in the comment-section, the trim function could be used as in

CASE
    WHEN column_name IS NULL OR trim(column_name) = '' THEN 'FAIL'
    ELSE 'PASS'
END

as long as it exists in your RDBMS.

Sign up to request clarification or add additional context in comments.

2 Comments

Maybe add TRIM to the second condition to remove whitespaces?
That's a good idea if the trim function exists in the RDBMS. Editing the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.