I am trying to parse the sample input test_string1 as below:
import re
TEST_STRING1 = """Using definitions from (yyyy/mm/dd): 2016/6/8
The following files are collected:
File: Test.exe
Source: Google
avping blob: 123123
Downloaded 3 Files
"""
def fun():
regex_exp = re.compile(r"File:\s(?P<File>[^\n\r\t]+?)[\n\r\t\s]*?"
r"Source:\s(?P<Source>.*)[^\w\d]*?"
r"avping\sblob:\s(?P<Avping_blob>([A-F]|[a-f]|[0-9]){6})")
result = {}
result['Files'] = []
for m in re.finditer(regex_exp, TEST_STRING1):
result['Files'].append(m.groupdict())
print result
if __name__ == "__main__":
fun()
Output of the Above code is :
{'Files': [{'Source': 'Google', 'File': 'Test.exe', 'Avping_blob': '123123'}]}
I want to make some fields in Input optional such as avping blob: Like
TET_STRING1 = """Using definitions from (yyyy/mm/dd): 2016/6/8
The following files are collected:
File: Test.exe
Source: Google
Downloaded 3 Files
"""
In that casa above regex return no match.
I have updated the regex as
regex_exp = re.compile(r"(File:\s(?P<File>[^\n\r\t]+?)[\n\r\t\s]*?"
r"Source:\s(?P<Source>.*)[^\w\d]*?"
r"|avping\sblob:\s(?P<Avping_blob>([A-F]|[a-f]|[0-9]){6}))")
by adding | before last line. But then It gives 2 matches with OR as
{'Files': [{'Source': 'Google', 'File': 'Test.exe', 'Avping_blob': None}, {'Source': None, 'File': None, 'Avping_blob': '123123'}]}
How should I write regex that will match the pattern for both input types (with and without optional fields)? Thanks