4

I have data that looks something like this example (on an unfortunately much larger scale):

+----+-------+--------------------+-----------------------------------------------+
| ID | Data  | Cost               | Comments                                      |
+----+-------+--------------------+-----------------------------------------------+
| 1  | 1|2|3 | $0.00|$3.17|$42.42 | test test||previous thing has a blank comment |
+----+-------+--------------------+-----------------------------------------------+
| 2  | 1     | $420.69            | test                                          |
+----+-------+--------------------+-----------------------------------------------+
| 3  | 1|2   | $3.50|$4.20        | |test                                         |
+----+-------+--------------------+-----------------------------------------------+

Some of the columns in the table I have are pipeline delimited, but they are consistent by each row. So each delimited value corresponds to the same index in the other columns of the same row.

So I can do something like this which is what I want for a single column:

SELECT ID, s.value AS datavalue
FROM MyTable t CROSS APPLY STRING_SPLIT(t.Data, '|') s

and that would give me this:

+----+-----------+
| ID | datavalue |
+----+-----------+
| 1  | 1         |
+----+-----------+
| 1  | 2         |
+----+-----------+
| 1  | 3         |
+----+-----------+
| 2  | 1         |
+----+-----------+
| 3  | 1         |
+----+-----------+
| 3  | 2         |
+----+-----------+

but I also want to get the other columns as well (cost and comments in this example) so that the corresponding items are all in the same row like this:

+----+-----------+-----------+------------------------------------+
| ID | datavalue | costvalue | commentvalue                       |
+----+-----------+-----------+------------------------------------+
| 1  | 1         | $0.00     | test test                          |
+----+-----------+-----------+------------------------------------+
| 1  | 2         | $3.17     |                                    |
+----+-----------+-----------+------------------------------------+
| 1  | 3         | $42.42    | previous thing has a blank comment |
+----+-----------+-----------+------------------------------------+
| 2  | 1         | $420.69   | test                               |
+----+-----------+-----------+------------------------------------+
| 3  | 1         | $3.50     |                                    |
+----+-----------+-----------+------------------------------------+
| 3  | 2         | $4.20     | test                               |
+----+-----------+-----------+------------------------------------+

I'm not sure what the best or most simple way to achieve this would be

2 Answers 2

8

This isn't going to be achievable with STRING_SPLIT as Microsoft refuse to supply the ordinal position as part of the result set*. As a result, you'll need to use a different function which does. Personally, I recommend DelimitedSplit8k_LEAD, originally written by Jeff Moden and then improved by Eirikur Eiriksson.

Then, you can do this:

CREATE TABLE #Sample (ID int,
                      [Data] varchar(200),
                      Cost  varchar(200),
                      Comments varchar(8000));
GO
INSERT INTO #Sample
VALUES (1,'1|2|3','$0.00|$3.17|$42.42','test test||previous thing has a blank comment'),
       (2,'1','$420.69','test'),
       (3,'1|2','$3.50|$4.20','|test');

GO
SELECT S.ID,
       DSd.Item AS DataValue,
       DSc.Item AS CostValue,
       DSct.Item AS CommentValue
FROM #Sample S
     CROSS APPLY dbo.DelimitedSplit8K_LEAD(S.[Data],'|')  DSd
     CROSS APPLY (SELECT *
                  FROM DelimitedSplit8K_LEAD(S.Cost,'|') SS
                  WHERE SS.ItemNumber = DSd.ItemNumber) DSc
     CROSS APPLY (SELECT *
                  FROM DelimitedSplit8K_LEAD(S.Comments,'|') SS
                  WHERE SS.ItemNumber = DSd.ItemNumber) DSct;

GO
DROP TABLE #Sample;
GO

There is, however, only one true answer to this question: Don't store delimited values in SQL Server. Store them in a normalised manner, and you won't have this problem.

* An ordinal parameter was added to STRING_SPLIT in SQL Server 2022.

SELECT S.ID,
       SSd.Value AS DataValue,
       SSc.Value AS CostValue,
       SSct.Value AS CommentValue
FROM #Sample S
        CROSS APPLY STRING_SPLIT(S.[Data],'|',1)  SSd
        CROSS APPLY (SELECT *
                    FROM STRING_SPLIT(S.Cost,'|',1) SS
                    WHERE SS.ordinal = SSd.ordinal) SSc
        CROSS APPLY (SELECT *
                    FROM STRING_SPLIT(S.Comments,'|',1) SS
                    WHERE SS.ordinal = SSd.ordinal) SSct;
Sign up to request clarification or add additional context in comments.

3 Comments

The ordinal position is so critical. It always shocks me that Jeff Moden's is the only viable splitter that includes it. Most XML splitter at least return them in the same order but I prefer to be explicit rather than assume it works.
Yep, not too fun to work with data like this. Have an API basically feeding us the data this way. Will check this solution out more. Thanks! (Might just end up pulling all of the data out of the DB and splitting it with a script to get it inserted properly)
@SeanLange I honestly expected MS to update the function with 2017 to provide the position. I was actually really disappointed with them when they didn't, as the demand for the functionality from the communities was pretty high.
1

Here is a solution approach using a recursive CTE instead of a User Defined Funtion (UDF) which is useful for those without permission to create functions.

CREATE TABLE mytable(
   ID       INTEGER  NOT NULL PRIMARY KEY 
  ,Data     VARCHAR(7) NOT NULL
  ,Cost     VARCHAR(20) NOT NULL
  ,Comments VARCHAR(47) NOT NULL
);
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (1,'1|2|3','$0.00|$3.17|$42.42','test test||previous thing has a blank comment');
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (2,'1','$420.69','test');
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (3,'1|2','$3.50|$4.20','|test');

This query allows choice of delimiter by using a variable, then using a common table expression it parses each delimited string to produce a rows for each portion of those strings, and retains the ordinal position of each.

declare @delimiter as varchar(1)
set @delimiter = '|'

;with cte as (
      select id
           , convert(varchar(max), null) as datavalue
           , convert(varchar(max), null) as costvalue
           , convert(varchar(max), null) as commentvalue
           , convert(varchar(max), data + @delimiter) as data
           , convert(varchar(max), cost + @delimiter) as cost
           , convert(varchar(max), comments + @delimiter) as comments
      from mytable as t
      union all
      select id
           , convert(varchar(max), left(data, charindex(@delimiter, data) - 1))
           , convert(varchar(max), left(cost, charindex(@delimiter, cost) - 1))
           , convert(varchar(max), left(comments, charindex(@delimiter, comments) - 1))
           , convert(varchar(max), stuff(data, 1, charindex(@delimiter, data), ''))
           , convert(varchar(max), stuff(cost, 1, charindex(@delimiter, cost), ''))
           , convert(varchar(max), stuff(comments, 1, charindex(@delimiter, comments), ''))
      from cte
      where (data like ('%' + @delimiter + '%') and cost like ('%' + @delimiter + '%')) or comments like ('%' + @delimiter + '%')
     )
select id, datavalue, costvalue, commentvalue
from cte
where datavalue IS NOT NULL
order by id, datavalue

As the recursion adds new rows, it places the first portion of the delimited strings into the wanted output columns using left(), then also, using stuff(), removes the last used delimiter from the source strings so that the next row will start at the next delimiter. Note that to initiate the extractions, the delimiter is added to the end of the source delimited strings which is to ensure the where clause does not exclude any of the wanted strings.

the result:

  id   datavalue   costvalue              commentvalue             
 ---- ----------- ----------- ------------------------------------ 
   1           1   $0.00       test test                           
   1           2   $3.17                                           
   1           3   $42.42      previous thing has a blank comment  
   2           1   $420.69     test                                
   3           1   $3.50                                           
   3           2   $4.20       test                

demonstrated here at dbfiddle.uk

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.