1

I am trying to parse the following text using pyparsing.

acp (SOLO1,
     "solo-100",
     "hi here is the gift"
     "Maximum amount of money, goes",
     430, 90)

jhk (SOLO2,
     "solo-101",
     "hi here goes the wind."
     "and, they go beyond",
     1000, 320)

I have tried the following code but it doesn't work.

flag = Word(alphas+nums+'_'+'-')
enclosed = Forward()
nestedBrackets = nestedExpr('(', ')', content=enclosed)
enclosed << (flag | nestedBrackets)

print list(enclosed.searchString (str1))

The comma(,) within the quotation is producing undesired results.

1
  • There is no need to define nestedExpr with a Forward - nestedExpr will take care of all of the parenthetical nesting. For this, you just need section = flag + nestedExpr(content=Word(nums) | flag | quotedString) and then parse for OneOrMore(section). Commented Jul 31, 2015 at 18:53

1 Answer 1

1

Well, I might have oversimplified slightly in my comments - here is a more complete answer.

If you don't really have to deal with nested data items, then a single-level parenthesized data group in each section will look like this:

LPAR,RPAR = map(Suppress, "()")
ident = Word(alphas, alphanums + "-_")
integer = Word(nums)

# treat consecutive quoted strings as one combined string
quoted_string = OneOrMore(quotedString)
# add parse action to concatenate multiple adjacent quoted strings
quoted_string.setParseAction(lambda t: '"' + 
                            ''.join(map(lambda s:s.strip('"\''),t)) + 
                            '"' if len(t)>1 else t[0])
data_item = ident | integer | quoted_string

# section defined with no nesting
section = ident + Group(LPAR + delimitedList(data_item) + RPAR)

I wasn't sure if it was intentional or not when you omitted the comma between two consecutive quoted strings, so I chose to implement logic like Python's compiler, in which two quoted strings are treated as just one longer string, that is "AB CD " "EF" is the same as "AB CD EF". This was done with the definition of quoted_string, and adding the parse action to quoted_string to concatenate the contents of the 2 or more component quoted strings.

Finally, we create a parser for the overall group

results = OneOrMore(Group(section)).parseString(source)
results.pprint()

and get from your posted input sample:

[['acp',
  ['SOLO1',
   '"solo-100"',
   '"hi here is the giftMaximum amount of money, goes"',
   '430',
   '90']],
 ['jhk',
  ['SOLO2',
   '"solo-101"',
   '"hi here goes the wind.and, they go beyond"',
   '1000',
   '320']]]

If you do have nested parenthetical groups, then your section definition can be as simple as this:

# section defined with nesting
section = ident + nestedExpr()

Although as you have already found, this will retain the separate commas as if they were significant tokens instead of just data separators.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.