This works on your example:
>>> txt='''\
... commit 34343asdfasdf343434asdfasdfas
... Author: John Doe <[email protected]>
... Date: Wed Jun 25 09:51:49 2014 +0800'''
>>> json.dumps({k:v for k,v in re.findall(r'^([^\s]+)\s+(.+?)$', txt, re.M)})
{"commit": "34343asdfasdf343434asdfasdfas", "Date:": "Wed Jun 25 09:51:49 2014 +0800", "Author:": "John Doe <[email protected]>"}
If you have the git... part, just split it off:
>>> json.dumps({k:v for k,v in re.findall(r'^([^\s]+)\s+(.+?)$',
txt.partition('\n\n')[2], re.M)})
And if you want to loose the : just change the regex capturing group to say so:
>>> json.dumps({k:v for k,v in re.findall(r'^(\w+):?\s+(.+?)$',
txt.partition('\n\n')[2], re.M)})
{"Date": "Wed Jun 25 09:51:49 2014 +0800", "commit": "34343asdfasdf343434asdfasdfas", "Author": "John Doe <[email protected]>"}
And if you want to loose the email address:
>>> json.dumps({k:v for k,v in re.findall(r'^(\w+):?\s+(.+?)(?:\s*<[^>]*>)?$',
txt.partition('\n\n')[2], re.M)})
{"Date": "Wed Jun 25 09:51:49 2014 +0800",
"commit": "34343asdfasdf343434asdfasdfas", "Author": "John Doe"}