Parse User Input in Python

Question

I'm trying to parse a user input that each word/name/number gets seperated by whitespace (except for strings which are defined by double quotes) and gets pushed into a list. The list gets printed along the way. I previously made a version of this code but this time I want to used Tokens to make things cleaner. Here's what I have so far but it's not printing anything.

    #!/util/bin/python
import re


def main ():


    for i in tokenizer('abcd xvc  23432 "exampe" 366'):
        print (i);



    tokens = (
  ('STRING', re.compile('"[^"]+"')),  # longest match
  ('NAME', re.compile('[a-zA-Z_]+')),
  ('SPACE', re.compile('\s+')),
  ('NUMBER', re.compile('\d+')),
)


def tokenizer(s):
  i = 0
  lexeme = []
  while i < len(s):
    match = False
    for token, regex in tokens:
      result = regex.match(s, i)
      if result:
        lexeme.append((token, result.group(0)))
        i = result.end()
        match = True
        break
    if not match:
      raise Exception('lexical error at {0}'.format(i))
  return lexeme




  main()

Hai Vu · Accepted Answer · 2014-03-30 17:54:06Z

2

I suggest to use the shlex module for breaking up quoted string:

>>> import shlex
>>> s = 'hello "quoted string" 123   \'More quoted string\' end'
>>> s
'hello "quoted string" 123   \'More quoted string\' end'
>>> shlex.split(s)
['hello', 'quoted string', '123', 'More quoted string', 'end']

After that, you can classify all your tokens (string, number, ...) as you want. The only thing you are missing is space: shlex does not care about space.

Here is a simple demo:

import shlex

if __name__ == '__main__':
    line = 'abcd xvc  23432 "exampe" 366'
    tokens = shlex.split(line)
    for token in tokens:
        print '>{}<'.format(token)

Output:

>abcd<
>xvc<
>23432<
>exampe<
>366<

Update

If you insist on not stripping the quote marks, then call split() with posix=False:

    tokens = shlex.split(line, posix=False)

Output:

>abcd<
>xvc<
>23432<
>"exampe"<
>366<

edited Mar 30, 2014 at 17:54

answered Mar 30, 2014 at 4:58

Hai Vu

41.4k16 gold badges75 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Manny O Over a year ago

So for that demo it will print the line without the space characters? @Hai Vu

Hai Vu Over a year ago

I updated my solution and you can see there is no space before or after each token.

Manny O Over a year ago

Ok got that but I think that it also strips away the double quotes (") if there is a string. How do I concatenate the quote back on? @Hai Vu

warvariuc · Accepted Answer · 2014-03-30 04:56:26Z

1

I think your indentation is broken, this:

#!/util/bin/python
import re

tokens = (
  ('STRING', re.compile('"[^"]+"')),  # longest match
  ('NAME', re.compile('[a-zA-Z_]+')),
  ('SPACE', re.compile('\s+')),
  ('NUMBER', re.compile('\d+')),
)


def main ():

  for i in tokenizer('abcd xvc  23432 "exampe" 366'):
    print (i);


def tokenizer(s):
  i = 0
  lexeme = []
  while i < len(s):
    match = False
    for token, regex in tokens:
      result = regex.match(s, i)
      if result:
        lexeme.append((token, result.group(0)))
        i = result.end()
        match = True
        break
    if not match:
      raise Exception('lexical error at {0}'.format(i))
  return lexeme


main()

prints:

('NAME', 'abcd')
('SPACE', ' ')
('NAME', 'xvc')
('SPACE', '  ')
('NUMBER', '23432')
('SPACE', ' ')
('STRING', '"exampe"')
('SPACE', ' ')
('NUMBER', '366')

answered Mar 30, 2014 at 4:56

warvariuc

60.1k45 gold badges183 silver badges234 bronze badges

2 Comments

Manny O Over a year ago

Yes that was it thanks. But also I want it to push it to a list without pushing the space

Hai Vu Over a year ago

Use shlex and you don't have to worry about space. See my solution.

Collectives™ on Stack Overflow

Parse User Input in Python

2 Answers 2

Update

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related