How do I split a string containing a math expression into a list?

Question

How do I tokenize the string:

"2+24*48/32"

Into a list:

['2', '+', '24', '*', '48', '/', '32']

You want to split a string into a list, but you don't want to use .split() because it returns a list? You're contradicting yourself. If you don't want a list, then what is it you do want? — Jim
– Jim, Commented Sep 17, 2008 at 23:20
@Jim: I think Jibmo means that split() only allows you to specify one delimiter, so he would have to call it once for '+', once for '-', once for '/', etc... — readonly
– readonly, Commented Sep 17, 2008 at 23:28
sorry for the bad explanation, what I meant is that split will return a list, which means for the second split, I now need to iterate over strings within a list. syntaxly incorrect example.. string = "2+2-2" list = string.split(+) returns ['2', '+', '2-2'] now i need to iterate over 3 strings — Jibmo
– Jibmo, Commented Sep 18, 2008 at 0:46
You should mention that you're working on a program that needs to be able to evaluate these strings as arithmetic expressions. Jerub's answer covers that, but that's because he's a mindreader. — Allen
– Allen, Commented Sep 18, 2008 at 2:57
Why not just use SymPy? It should do what you're trying to achieve. — Brian Cain
– Brian Cain, Commented Sep 19, 2008 at 3:22

Glyph · Accepted Answer · 2020-07-16 20:17:17Z

51

It just so happens that the tokens you want split are already Python tokens, so you can use the built-in tokenize module. It's almost a one-liner; this program:

from io import StringIO
from tokenize import generate_tokens

STRING = 1
print(
    list(
        token[STRING]
    for token in generate_tokens(StringIO("2+24*48/32").readline)
    if token[STRING]
    )
)

produces this output:

['2', '+', '24', '*', '48', '/', '32']

edited Jul 16, 2020 at 20:17

answered Sep 21, 2008 at 16:25

Glyph

32.1k12 gold badges93 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kiv Over a year ago

Great answer, I didn't realize this module existed :)

roskakori Over a year ago

Instead or manually assigning STRING=1 you could use the constant from the token module by doing a from token import STRING. This is particular useful if you need several token constants.

Victor S Over a year ago

why would such a complicated answer be rated so high? It's a pretty simple question. Whatever happened to finding the cleanest, most concise answer?

Honest Abe · Accepted Answer · 2012-08-12 23:14:58Z

36

You can use split from the re module.

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Example code:

import re
data = re.split(r'(\D)', '2+24*48/32')

\D

When the UNICODE flag is not specified, \D matches any non-digit character; this is equivalent to the set [^0-9].

edited Aug 12, 2012 at 23:14

Honest Abe

8,7995 gold badges53 silver badges66 bronze badges

answered Sep 17, 2008 at 23:25

readonly

358k109 gold badges207 silver badges206 bronze badges

Comments

molasses · Accepted Answer · 2008-09-22 07:24:22Z

18

>>> import re
>>> re.findall(r'\d+|\D+', '2+24*48/32=10')

['2', '+', '24', '*', '48', '/', '32', '=', '10']

Matches consecutive digits or consecutive non-digits.

Each match is returned as a new element in the list.

Depending on the usage, you may need to alter the regular expression. Such as if you need to match numbers with a decimal point.

>>> re.findall(r'[0-9\.]+|[^0-9\.]+', '2+24*48/32=10.1')

['2', '+', '24', '*', '48', '/', '32', '=', '10.1']

edited Sep 22, 2008 at 7:24

answered Sep 18, 2008 at 2:39

molasses

3,3406 gold badges24 silver badges22 bronze badges

Comments

Jerub · Accepted Answer · 2008-10-31 16:08:37Z

18

This looks like a parsing problem, and thus I am compelled to present a solution based on parsing techniques.

While it may seem that you want to 'split' this string, I think what you actually want to do is 'tokenize' it. Tokenization or lexxing is the compilation step before parsing. I have amended my original example in an edit to implement a proper recursive decent parser here. This is the easiest way to implement a parser by hand.

import re

patterns = [
    ('number', re.compile('\d+')),
    ('*', re.compile(r'\*')),
    ('/', re.compile(r'\/')),
    ('+', re.compile(r'\+')),
    ('-', re.compile(r'\-')),
]
whitespace = re.compile('\W+')

def tokenize(string):
    while string:

        # strip off whitespace
        m = whitespace.match(string)
        if m:
            string = string[m.end():]

        for tokentype, pattern in patterns:
            m = pattern.match(string)
            if m:
                yield tokentype, m.group(0)
                string = string[m.end():]

def parseNumber(tokens):
    tokentype, literal = tokens.pop(0)
    assert tokentype == 'number'
    return int(literal)

def parseMultiplication(tokens):
    product = parseNumber(tokens)
    while tokens and tokens[0][0] in ('*', '/'):
        tokentype, literal = tokens.pop(0)
        if tokentype == '*':
            product *= parseNumber(tokens)
        elif tokentype == '/':
            product /= parseNumber(tokens)
        else:
            raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal))

    return product

def parseAddition(tokens):
    total = parseMultiplication(tokens)
    while tokens and tokens[0][0] in ('+', '-'):
        tokentype, literal = tokens.pop(0)
        if tokentype == '+':
            total += parseMultiplication(tokens)
        elif tokentype == '-':
            total -= parseMultiplication(tokens)
        else:
            raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal))

    return total

def parse(tokens):
    tokenlist = list(tokens)
    returnvalue = parseAddition(tokenlist)
    if tokenlist:
        print 'Unconsumed data', tokenlist
    return returnvalue

def main():
    string = '2+24*48/32'
    for tokentype, literal in tokenize(string):
        print tokentype, literal

    print parse(tokenize(string))

if __name__ == '__main__':
    main()

Implementation of handling of brackets is left as an exercise for the reader. This example will correctly do multiplication before addition.

edited Oct 31, 2008 at 16:08

answered Sep 17, 2008 at 23:54

Jerub

42.8k15 gold badges76 silver badges91 bronze badges

3 Comments

Jibmo Over a year ago

I'm reading up on tokenizing now to understand it. So I'm not able too say where the problem is though I think it's in the fact that this script will eval * and / at the same time, which is incorrect. 8/2*2 this string should print a result of 2, but it prints a result of 8.

Jibmo Over a year ago

excuse me im wrong, always took bomdas literally turns out multiplication and division are equal in order of predecnce and whichever is occurs first is evaluated first

Air Over a year ago

In tokenize: Why use re to remove whitespace over a built-in string function?

Ber · Accepted Answer · 2008-09-19 07:37:18Z

6

This is a parsing problem, so neither regex not split() are the "good" solution. Use a parser generator instead.

I would look closely at pyparsing. There have also been some decent articles about pyparsing in the Python Magazine.

answered Sep 19, 2008 at 7:37

Ber

42k16 gold badges79 silver badges90 bronze badges

Comments

Jiayao Yu · Accepted Answer · 2008-09-17 23:25:52Z

5

s = "2+24*48/32"

p = re.compile(r'(\W+)')

p.split(s)

answered Sep 17, 2008 at 23:25

Jiayao Yu

8181 gold badge7 silver badges14 bronze badges

Comments

Cristian · Accepted Answer · 2008-09-17 23:21:58Z

4

Regular expressions:

>>> import re
>>> splitter = re.compile(r'([+*/])')
>>> splitter.split("2+24*48/32")

You can expand the regular expression to include any other characters you want to split on.

answered Sep 17, 2008 at 23:21

Cristian

44.2k28 gold badges90 silver badges99 bronze badges

Comments

habnabit · Accepted Answer · 2008-09-18 03:07:27Z

4

Another solution to this would be to avoid writing a calculator like that altogether. Writing an RPN parser is much simpler, and doesn't have any of the ambiguity inherent in writing math with infix notation.

import operator, math
calc_operands = {
    '+': (2, operator.add),
    '-': (2, operator.sub),
    '*': (2, operator.mul),
    '/': (2, operator.truediv),
    '//': (2, operator.div),
    '%': (2, operator.mod),
    '^': (2, operator.pow),
    '**': (2, math.pow),
    'abs': (1, operator.abs),
    'ceil': (1, math.ceil),
    'floor': (1, math.floor),
    'round': (2, round),
    'trunc': (1, int),
    'log': (2, math.log),
    'ln': (1, math.log),
    'pi': (0, lambda: math.pi),
    'e': (0, lambda: math.e),
}

def calculate(inp):
    stack = []
    for tok in inp.split():
        if tok in self.calc_operands:
            n_pops, func = self.calc_operands[tok]
            args = [stack.pop() for x in xrange(n_pops)]
            args.reverse()
            stack.append(func(*args))
        elif '.' in tok:
            stack.append(float(tok))
        else:
            stack.append(int(tok))
    if not stack:
        raise ValueError('no items on the stack.')
    return stack.pop()
    if stack:
        raise ValueError('%d item(s) left on the stack.' % len(stack))

calculate('24 38 * 32 / 2 +')

answered Sep 18, 2008 at 3:07

habnabit

10.4k3 gold badges35 silver badges26 bronze badges

1 Comment

Jerub Over a year ago

Why don't you just go implement forth, it'll only be 5 more lines!

jbchichoko · Accepted Answer · 2012-01-14 16:21:15Z

1

>>> import re
>>> my_string = "2+24*48/32"
>>> my_list = re.findall(r"-?\d+|\S", my_string)
>>> print my_list

['2', '+', '24', '*', '48', '/', '32']

This will do the trick. I have encountered this kind of problem before.

answered Jan 14, 2012 at 16:21

jbchichoko

1,6742 gold badges13 silver badges17 bronze badges

Comments

Timotheos · Accepted Answer · 2010-08-19 00:38:43Z

0

This doesn't answer the question exactly, but I believe it solves what you're trying to achieve. I would add it as a comment, but I don't have permission to do so yet.

I personally would take advantage of Python's maths functionality directly with exec:

expression = "2+24*48/32"
exec "result = " + expression
print result
38

answered Aug 19, 2010 at 0:38

Timotheos

4253 silver badges8 bronze badges

2 Comments

P Daddy Over a year ago

Forgive me if I'm wrong, but wouldn't it be preferable to use result = eval(expression)?

Timotheos Over a year ago

Indeed it would; my apologies.

Jay D · Accepted Answer · 2012-03-02 08:05:39Z

0

i'm sure Tim meant

splitter = re.compile(r'([\D])').

if you copy exactly what he has down you only get the digits not the operators.

edited Mar 2, 2012 at 8:05

Jay D

3,3075 gold badges34 silver badges50 bronze badges

answered Sep 18, 2008 at 0:45

Comments

Xinyue Zhang · Accepted Answer · 2022-04-29 22:44:36Z

0

Here is a good way that I always use when splitting str with different special characters. However, this code does not work with _, if there is a _ in the str you want to split, you might need to do another split one more time.

import re
  
  
# initializing string  
data = "2+24*48/32"
  
# printing original string  
print("The original string is : " + data) 
  
# Using re.findall() 
# Splitting characters in String 
res = re.findall(r"[\w']+", data)
  
# printing result  
print("The list after performing split functionality : " + str(res))

answered Apr 29, 2022 at 22:44

Xinyue Zhang

1

Collectives™ on Stack Overflow

How do I split a string containing a math expression into a list?

12 Answers 12

3 Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

3 Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related