0

I am trying to parse a substring using re.

From the string present in variable s,I would like to split the string present till the first !(the string stored in s has two !) and store it as a substring.From this substring(stored in variable result), I wish to parse another substring.

Here is the code,

import re
s='ecNumber*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25#!ecNumber*2.3.1.11#kmValue*0.081#kmValueMaximum*#!'


Data={}

result = re.search('%s(.*)%s' % ('ec', '!'), s).group(1)
print result
ecNumber = re.search('%s(.*)%s' % ('Number*', '#kmValue*'), result).group(1)
Data["ecNumber"]=ecNumber
print Data

The value corresponding to each tag present in the substring(example:ecNumber) is stored in between * and # (example: *2.4.1.11#).I attempted to parse the value stored for the tag ecNumber in the first substring. The output I obtain is

result='Number*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25#!ecNumber*2.3.1.11#kmValue*0.081#kmValueMaximum*#'
{'ecNumber': '*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25#!ecNumber*2.3.1.11#kmValue*0.081'}

The desired output is

result= 'ecNumber*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25#'
{'ecNumber': '2.4.1.11'}

I would like to store each tag and its corresponding value.For example,

{'ecNumber': '2.4.1.11','kmValue':'0.021','kmValueMaximum':'1.25'}

2 Answers 2

1

Despite you are asking a solution with regular expression, I would say it's much easier to use direct string operations for this problem, since the source string is well formatted.

For infomation before the first i:

print dict([i.split('*') for i in s.split('!', 1)[0].split('#') if i])

For all information in s:

print [dict([i.split('*') for i in j.split('#') if i]) for j in s.split('!') if j] 
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot for suggesting this wonderful alternative.Could you please give some advice on how to loop through each substring? i.e substring1=ecNumber*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25# substring2=ecNumber*2.3.1.11#kmValue*0.081#kmValueMaximum*#
I tried the following,but didn't work 'substring=[0]*2 for j,i in zip(range(0,2),range(0,2)): substring[j]= dict([i.split('*') for i in s.split('!', j)[0].split('#') if i]) print substring'
s.split('!', j) should be s.split('!', 1)
From your original question, you only need infomation before the first !. If you need all information, try print [dict([i.split('*') for i in j.split('#') if i]) for j in s.split('!') if j]
1

You can try this:

import re
s='ecNumber*2.4.1.11#kmValue*0.57#kmValueMaximum*1.25#' 
new_data = re.findall('(?<=^)[a-zA-Z]+(?=\*)|(?<=#)[a-zA-Z]+(?=\*)|(?<=\*)[-\d\.]+(?=#)', s)
final_data = dict([new_data[i:i+2] for i in range(0, len(new_data)-1, 2)])

Output:

{'kmValue': '0.57', 'kmValueMaximum': '1.25', 'ecNumber': '2.4.1.11'}

2 Comments

Thank you so much.Could you please tell how to use the above for parsing kmValue ({'kmValue ': '-999'}?I tried altering the index of new_data.But it didn't work.
Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.