I get string of logical expressions from a database and need to put these into a list of lists for further evaluation. I already tried reading a lot about string parsing but could not find the answer so far. For easier understanding of the problem, here are 3 examples:
input_string1 = '((A OR B) AND (C OR D)) OR E'
input_string2 = '(A AND ( B OR C ) AND D AND E)'
input_string3 = ' A OR ( B AND C ) OR D OR E'
expected ouput:
Results_string1=[ ['A', 'C'], ['A','D'], ['B','C'], ['B','D'], ['E']]
Results_string2=[ ['A', 'B', 'D', 'E'], ['A', 'C', 'D', 'E'] ]
Results_string3=[ ['A'], ['B','C'], ['D'], ['E'] ]
So basically I need the fully factorized expressions in terms of OR and put those into the list. This means any AND condition is expressed by having both expressions in the same sublist, while any OR condition triggers the creation of new sublists.
e.g. E AND F --> [E, F], E OR F --> [[E],[F]]
The strings from the database have arbitrary length and an arbitrary amount of brackets.
Anyone got an idea how to define the grammar, such that I can use e.g. the pyparsing package?
The start of the grammar so far is:
import pyparsing as pp
gene_id = pp.Word(pp.alphanums)
logical = ( pp.Keyword("AND") | pp.Keyword("OR") ).setName("logical")
l_brackets = (pp.Literal('(') ).setName('l_brackets')
r_brackets = ( pp.Literal(')') ).setName('r_brackets')
But how do I have to define the real parser?
One of the main problems is that I don't know how to handle the arbitrary occurring brackets and varying length of the string. I've been playing around with the nestedExpr()-parser from the pyparser toolbox but could not create the correct behavior so far.
+or*if he's willing to go for the symbols used in Discrete Mathematics.