0

I have a string containing exactly one pair of parentheses (and some words between them), and lots of other words.

How would one create a regex to split the string into [ words before (, words between (), words after )]?

e.g.

line = "a   bbbb cccc     dd     ( ee fff ggg )    hhh iii jk"

would be split into

[ "a   bbbb cccc     dd", "ee fff ggg", "hhh iii jk" ]

I've tried

line = re.compile("[^()]+").split(line)

but it doesn't work.

2 Answers 2

2

It seems that in the process you want to remove the leading and trailing whitespaces, i.e., the whitespaces before and after ( and ). You could try:

>>> line = "a   bbbb cccc     dd     ( ee fff ggg )    hhh iii jk"
>>> re.split(r'\s*[\(\)]\s*', line)
['a   bbbb cccc     dd', 'ee fff ggg', 'hhh iii jk']
>>>
>>> # to make it look as in your description ...
>>> line = re.compile(r'\s*[\(\)]\s*').split(line)
>>> line
['a   bbbb cccc     dd', 'ee fff ggg', 'hhh iii jk']
Sign up to request clarification or add additional context in comments.

Comments

1

To split the output in three I think the simplest option is to use three capture groups (some_regex)(another_regex)(yet_another_regex). In your case, the first part is any character that is not a (, followed by (, then any character that is not ) followed by ) and finally followed by any character.

Therefore the regex is ([^(]*)\(([^)]*)\)(.*), which you can then use to retrieve groups (your desired output):

>>> import re
>>> pattern = re.compile(r'([^(]*)\(([^)]*)\)(.*)')
>>> pattern.match(line).groups()
('a   bbbb cccc     dd     ', ' ee fff ggg ', '    hhh iii jk')

With:

  • ([^(]*) the first group
  • ([^)]*) the second group
  • (.*) the last group

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.