-1

I want to splitting a string such as Si(C3(COOH)2)4(H2O)7 into the following

[Si, (C3(COOH)2), 4, (H2O), 7]

That is, entire paranthesis expressions turn into an element by themselves. I've tried a number of different combinations with re.findall() to no avail. Any help is greatly appreciated.

2
  • You can't parse nested structures with regular expressions in Python, so there's no way to find out which ( or ) to split on. Commented Nov 11, 2016 at 18:44
  • ((C3(COOH)2) should be (C3(COOH)2). Commented Nov 11, 2016 at 22:39

1 Answer 1

0

You have to scan the string yourself, keeping track of the nesting depth. The significant 'events' are 'at beginning of string', 'at (', 'at )', and 'at end of string'. At each event, consider depth and reset it.

inn = 'Si(C3(COOH)2)4(H2O)7'
out = ['Si', '(C3(COOH)2)', '4', '(H2O)', '7']
res = []
beg = 0
dep = 0
for i, c in enumerate(inn):
    if c == '(':
        if dep == 0 and beg < i:
            res.append(inn[beg:i])
            beg = i
        dep += 1
    elif c == ')':
        if dep == 0:
            raise ValueError("')' without prior '('")
        elif dep == 1:
            res.append(inn[beg:i+1])
            beg = i+1
        dep -= 1
if dep == 0:
    res.append(inn[beg:i+1])
else:
    raise ValueError("'(' without following ')'")
print(res, res == out)

# prints
# ['Si', '(C3(COOH)2)', '4', '(H2O)', '7'] True
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.