1

I am trying to select the string between every pair <....> in the fifth column "QuestionTags".

here is a sample of the data:

enter image description here

i used CHARINDEX function but it returned me an integer.

i also used SUBSTRING but it asks me to define the character index and length of string.

Any suggestions?

11
  • You could use combine of them, and show your desired result. Commented Apr 15, 2017 at 14:38
  • how do i combine them? Commented Apr 15, 2017 at 14:40
  • You should fix your data structure so you are not storing lists of values as a string. Commented Apr 15, 2017 at 14:49
  • @GordonLinoff - this looks like the StackOverflow datadump. Commented Apr 15, 2017 at 14:50
  • @GordonLinoff can you elaborate more? Commented Apr 15, 2017 at 14:51

2 Answers 2

4

If you can use SQL Server 2016 for this then there is a built in string_split function that will do the job

SELECT *
FROM   YourTable
       OUTER APPLY (SELECT SUBSTRING(value, 2, 8000) value
                    FROM   string_split(QuestionTags, '>')
                    WHERE  value <> '') OA 

A demo in Stack Exchange Data Explorer as it looks like you are using SE data.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a reason you choose SUBSTRING() here instead of STUFF()?
@GordonLinoff just personal preference. I find it clearer.
2

If you are open to a Table-Valued-Function and not using 2016.

Tired of extracting strings (charindex,left,right,substring,...), I modified a Parse/Split function to accept two NON-LIKE delimiters. In your case a < and >

Example

Declare @YourTable table (ID int,QuestionTags varchar(max))
Insert Into @YourTable values
 (1,'<php><arrays><cloud><tag-cloud>')
,(2,'<windows><mailto>')

Select A.ID
      ,B.*
 From  @YourTable A
 Cross Apply [dbo].[udf-Str-Extract](A.QuestionTags,'<','>') B

Returns

ID  RetSeq  RetPos  RetVal
1   1       2       php
1   2       7       arrays
1   3       15      cloud
1   4       22      tag-cloud
2   1       2       windows     --<< Second Record
2   2       11      mailto

The UDF if interested

CREATE FUNCTION [dbo].[udf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  

with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
       cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
       cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
       cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)

Select RetSeq = Row_Number() over (Order By N)
      ,RetPos = N
      ,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
 From  (
        Select *,RetVal = Substring(@String, N, L) 
         From  cte4
       ) A
 Where charindex(@Delimiter2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[udf-Str-Extract] (@String,'[[',']]')
*/

9 Comments

@ John Cappelletti. using the second method, it is showing me an error message : Incorrect syntax near 'A'
@Taie Did you apply/create the UDF?
Regarding the Table-Valued-Function. can we make it dynamic? because i have a large number of rows.
@Taie what is a large number of rows and why would dynamic do anything?
i have around 8 millions rows. by dynamic i mean "automatic"
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.