3

I'm looking at about 13,000 rows in a SQL Server table, and trying to parse out certain values within one column that is stored as json.

The json column values look something like this:

..."http://www.companyurl.com","FoundedYear":"2007","Status":"Private","CompanySize":"51-200","TagLine":"We build software we believe in","Origi...

I'd like to extract the value for "CompanySize", but not all rows include this attribute. Other complicating factors:

  • I'm not sure how many possible values there are within the "CompanySize" parameter.
  • "CompanySize" is not always followed by the "TagLine" parameter.

The one rule I know for certain: the CompanySize value is always a string of unknown length that follows the varchar string "CompanySize":" and terminates before the next "," string.

Ideally we would have upgraded fully to SQL Server 2016 so I'd be able to take advantage of SQL Server's JSON support, but that's not the case.

9
  • simple-talk.com/sql/t-sql-programming/… Commented Jul 29, 2015 at 19:10
  • 2
    Since there's no possibility of indexing such data, nor is there any way to return results without fully scanning your table, unless the relevant string is really huge that you'd like to avoid doing so - I'd just return the entire string and have it processed within a proper environment. Commented Jul 29, 2015 at 19:10
  • What @Amit said, and even better: actually create a table or schema for this data, and store it properly in the first place. Commented Jul 29, 2015 at 19:50
  • @Amit Why are you assuming that this value is being filtered on? Indexing might not be relevant anyway. It's also fairly easy to get this value and 13k rows is hardly any data to begin with. Commented Jul 29, 2015 at 19:55
  • @JoelCoehoorn You are making some assumptions here that might not be fair. It could be that they already process this data in the app layer, but now there is one particular reason to need a value in the DB that wasn't thought about originally, and no need to store the data twice. I have worked on a system where data was throw in as JSON because the developers liked it and there was no concern about needing to parse it later in the DB. In that case XML makes more sense since it can be parsed in both places. Commented Jul 29, 2015 at 19:57

2 Answers 2

4

You can do this with CHARINDEX since you can pass it a start position, which will allow you to get the closing ". You probably shouldn't look for "," since if CompanySize is the final property, it won't have the ," at the end of that fragment. Doing this as an Inline Table-Valued Function (iTVF) will be pretty efficient (especially since 13k rows is almost nothing), you just need to use it with either CROSS APPLY or OUTER APPLY:

USE [tempdb];
GO

CREATE FUNCTION dbo.GetCompanySize(@JSON NVARCHAR(MAX))
RETURNS TABLE
AS RETURN

WITH SearchStart AS
(
  SELECT '"CompanySize":"' AS [Fragment]
), Search AS
(
  SELECT CHARINDEX(ss.Fragment, @JSON) AS [Start],
         LEN(ss.Fragment) AS [FragmentLength]
  FROM   SearchStart ss
)
SELECT CASE Search.Start
         WHEN 0 THEN NULL
         ELSE SUBSTRING(@JSON,
                        (Search.Start + Search.FragmentLength),
                        CHARINDEX('"',
                                  @JSON,
                                  Search.Start + Search.FragmentLength
                                 ) - (Search.Start + Search.FragmentLength)
                       )
       END AS [CompanySize]
FROM Search;
GO

Set up the test:

CREATE TABLE #tmp (JSON NVARCHAR(MAX));

INSERT INTO #tmp (JSON) VALUES
('"http://www.companyurl.com","FoundedYear":"2007","Status":"Private","CompanySize":"51-200","TagLine":"We build software we believe in","Origi..');
INSERT INTO #tmp (JSON) VALUES
('"http://www.companyurl.com","FoundedYear":"2009","Status":"Public","TagLine":"We build software we believe in","Origi..');
INSERT INTO #tmp (JSON) VALUES (NULL);

Run the test:

SELECT comp.CompanySize
FROM   #tmp tmp
CROSS APPLY tempdb.dbo.GetCompanySize(tmp.JSON) comp

Returns:

CompanySize
-----------
51-200
NULL
NULL
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @srutzky this was helpful.
1

Building on @srutzky 's answer, the following solution avoids creating a UDF (although you didn't say that was a constraint, it might be useful for some).

select
    c.Id,
    substring(i2.jsontail, 0, i3.[length]) CompanySize
from
    Companies c cross apply
    ( select charindex('CompanySize":"', c.json) start ) i1 cross apply
    ( select substring(c.json, start + len('CompanySize":"'), len(c.json) - start ) jsontail ) i2 cross apply
    ( select charindex('"', i2.jsontail) [length] ) i3
where
    i1.[start] != 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.