4

I am trying to write a WHERE clause for where a certain string variable is not null or empty. The problem I am running into is that certain non-empty strings equal the N'' literal. For instance:

declare @str nvarchar(max) = N'㴆';
select case when @str = N'' then 1 else 0 end;

Yields 1. From what I can gather on Wikipedia, this particular unicode character is a pictograph for submerging something, which is not semantically equal to an empty string. Also, the string length is 1, at least in T-SQL.

Is there a better (accurate) way to check a T-SQL variable for the empty string?

9
  • A "no value" entry should be NULL. Then it is really easy to check: @str is not null ... Commented Feb 19, 2017 at 19:24
  • The application is question uses NULL to indicate "Do not modify this field" and N'' to indicate "Update this value to NULL". The design is intentional so that fields not needing modification can be omitted from data modification messages. Commented Feb 19, 2017 at 19:32
  • Do you mean: check if the value of nvarchar is ANSI STD? Commented Feb 19, 2017 at 20:08
  • Yields 1 with what database? SQLite 3, Postgres 9.6, and MySQL 5.7 all work with select case when N'㴆' = N'' then 1 else 0 end; (SQLite doesn't support N literals, but it works without it) Have you checked what's in @str? Commented Feb 19, 2017 at 20:14
  • @Schwern Microsoft SQL Server 2008 through 2016 all yield 1. The value of @str is the actual character in the post. I'm guessing this is a Microsoft problem. Commented Feb 19, 2017 at 20:28

1 Answer 1

6

I found a blog, https://bbzippo.wordpress.com/2013/09/10/sql-server-collations-and-string-comparison-issues/

which explained that

The problem is because the “default” collation setting (SQL_Latin1_General_CP1_CI_AS) for SQL Server cannot properly compare Unicode strings that contain so called Supplementary Characters (4-byte characters).

A fix is to use a collation that doesn't have problems with the supplementary characters. For example:

select case when N'㴆' COLLATE Latin1_General_100_CI_AS_KS_WS = N'' then 1 else 0 end;

will return 0. See the blog for more examples.

Since you are comparing to the empty string, another solution would be to test the string length.

declare @str1 nvarchar(max) =N'㴆';
select case when len(@str1) = 0 then 1 else 0 end;

This will return 0 as expected.

This also yields 0 when the string is null.

EDIT:

Thanks to devio's comment, I dug a bit deeper and found a comment from Erland Sommarskog https://groups.google.com/forum/#!topic/microsoft.public.sqlserver.server/X8UhQaP9KF0

that in addition to not supporting Supplementary Characters, the Latin1_General_CP1_CI_AS collation doesn't handle new Unicode characters correctly. So I'm guessing that the 㴆 character is a new Unicode character.

Specifying the collation Latin1_General_100_CI_AS will also fix this issue.

Sign up to request clarification or add additional context in comments.

1 Comment

You are correct regarding collations, but select DATALENGTH(@str) returns 2, CAST AS VARBINARY gives U+3D06, therefore not a supplementary character.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.