1

I need to split this string

(2005)[1]1,2,3,4[2]1(2008)[2]2–;3,4(2009)[3]1,2,3-4(2010)[4]1,2,3-4(2011)[5]1(2012)[5]2,3-4[6]1,2\[\](2014)[6]3-4[7]1-2(2015)[7]3-4[8]1-2(2016)[10]1[8]3-4[9]1-2,3-4(2017)[10]2

As:

1, "1,2,3,4"  
2, 1 2
2, 2–;3,4

For the input "(2005)[1]1,2,3,4" I need value in [ ] in capture group 1 and the rest of the string (1,2,3,4) in capture group 2 and repeat for the entire string

I have created this regex string but it is not working as intended

\[(.*?)\](.+?)(?=\[|\(|$)

Please see my regex implementation

The problem is when there is nothing after [] it is capturing (year) which it should not do

8
  • 1
    Try \[([^\]\[]*)\]([^\[(]*). Also, you might as well replace .+? with .*? in your pattern. Commented Oct 27, 2017 at 13:40
  • it worked perfectly thanks a lot Commented Oct 27, 2017 at 13:45
  • I have created a regex101 fiddle, but can't make head or tails of what you mean now. Please provide exact output for each test case. Commented Oct 27, 2017 at 19:55
  • so for the first example in test case i want to parse the data to generate this output "2015","5","1" \n "2015","3","22,23,24" \n "2015","8","1,2,3" Commented Oct 27, 2017 at 20:11
  • 1
    See this version, is it capturing the right values? Commented Oct 27, 2017 at 20:16

1 Answer 1

2

The (.+?)(?=\[|\(|$) part of the pattern matches any 1 or more chars other than a newline up to the leftmost [, ( or end of string. You need to allow matching zero or more chars here.

However, a [^\[(] negated character class here will be more efficient and elegant:

\[(.*?)\]([^\[(]*)

See this regex demo.

Or a bit more efficient,

\[([^\]\[]*)\]([^\[(]*)

See another regex demo.

Details

  • \[ - a [
  • ([^\]\[]*) - Group 1: any 0+ chars other than [ and ]
  • \] - a ]
  • ([^\[(]*) - Group 2: any 0+ chars other than [ and (.
Sign up to request clarification or add additional context in comments.

6 Comments

it is working fine for most of the values but in case of (2005) [1]1,2,3,4[2]1(2008)[2]2–;3,4(2009)[3]1,2,3-4(2010)[4]1,2,3-4(2011)[5]1(2012)[5]2,3-4[6]1,2[](2014)[6]3-4[7]1-2(2015)[7]3-4[8]1-2(2016)[10]1[8]3-4[9]1-2,3-4(2017)[10]2(2011)[40](1-4)(2012)[41]0(2013)[42]1-4(2014)[43]0(2015)[44]1-4,1-2(2017)[46] it is not picking up (1-4) can you help with that please?
Replace [^\[(]* with [^\[(]*(?:\(\d+-\d+\)[^\[(]*)* or a more generic [^\[(]*(?:\((?!\d{4}\))[^()]*\)[^\[(]*)*. It would be better if you could explain what kibd of value you expect after the number in brackets. Note I'm on a mobile now and cannot check if my pattern works well or not.
so i expect a- year or empty brckts b- anythinghere or empty brckts c- somethinghere. The format can be any of the below (a)[b]c (a)[b]c[b]c[b]c (a)[b]c(a)[b]c[b]c (a)[b](a)[b]c[b]c
I have updated the question with the possible format. I need to split the string so i can match it as group1 -> year, group2 -> val in [ ], group3 -> comma seperated values
can you please update the expression so it can capture 1- anything in ( ), 2 - anything in [ ] 3- comma seperated values after [ ]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.