SQL statement to check for empty string - T-SQL

Question

I am trying to write a WHERE clause for where a certain string variable is not null or empty. The problem I am running into is that certain non-empty strings equal the N'' literal. For instance:

declare @str nvarchar(max) = N'㴆';
select case when @str = N'' then 1 else 0 end;

Yields 1. From what I can gather on Wikipedia, this particular unicode character is a pictograph for submerging something, which is not semantically equal to an empty string. Also, the string length is 1, at least in T-SQL.

Is there a better (accurate) way to check a T-SQL variable for the empty string?

A "no value" entry should be NULL. Then it is really easy to check: @str is not null ... — juergen d
– juergen d, Commented Feb 19, 2017 at 19:24
The application is question uses NULL to indicate "Do not modify this field" and N'' to indicate "Update this value to NULL". The design is intentional so that fields not needing modification can be omitted from data modification messages. — Jesan Fafon
– Jesan Fafon, Commented Feb 19, 2017 at 19:32
Yields 1 with what database? SQLite 3, Postgres 9.6, and MySQL 5.7 all work with select case when N'㴆' = N'' then 1 else 0 end; (SQLite doesn't support N literals, but it works without it) Have you checked what's in @str? — Schwern
– Schwern, Commented Feb 19, 2017 at 20:14
@Schwern Microsoft SQL Server 2008 through 2016 all yield 1. The value of @str is the actual character in the post. I'm guessing this is a Microsoft problem. — Jesan Fafon
– Jesan Fafon, Commented Feb 19, 2017 at 20:28

DeanOC · Accepted Answer · 2017-02-19 23:48:51Z

6

I found a blog, https://bbzippo.wordpress.com/2013/09/10/sql-server-collations-and-string-comparison-issues/

which explained that

The problem is because the “default” collation setting (SQL_Latin1_General_CP1_CI_AS) for SQL Server cannot properly compare Unicode strings that contain so called Supplementary Characters (4-byte characters).

A fix is to use a collation that doesn't have problems with the supplementary characters. For example:

select case when N'㴆' COLLATE Latin1_General_100_CI_AS_KS_WS = N'' then 1 else 0 end;

will return 0. See the blog for more examples.

Since you are comparing to the empty string, another solution would be to test the string length.

declare @str1 nvarchar(max) =N'㴆';
select case when len(@str1) = 0 then 1 else 0 end;

This will return 0 as expected.

This also yields 0 when the string is null.

EDIT:

Thanks to devio's comment, I dug a bit deeper and found a comment from Erland Sommarskog https://groups.google.com/forum/#!topic/microsoft.public.sqlserver.server/X8UhQaP9KF0

that in addition to not supporting Supplementary Characters, the Latin1_General_CP1_CI_AS collation doesn't handle new Unicode characters correctly. So I'm guessing that the 㴆 character is a new Unicode character.

Specifying the collation Latin1_General_100_CI_AS will also fix this issue.

edited Feb 19, 2017 at 23:48

answered Feb 19, 2017 at 21:13

DeanOC

7,3326 gold badges44 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

devio Over a year ago

You are correct regarding collations, but select DATALENGTH(@str) returns 2, CAST AS VARBINARY gives U+3D06, therefore not a supplementary character.

Collectives™ on Stack Overflow

SQL statement to check for empty string - T-SQL

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related