Converting string into dictionary - pythonic Way

Question

Experts,

I have written a program to convert the string into dictionary. I'm able to achieve the desired result but i doubt if this is a pythonic way. Would like to hear suggestions on the same.

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''

I have split using (:) and have stored in dictionary. Here Cities and HeadQuarters contains another dictionary for which i have written code like this.

if k == 'cities' : 
    D[k] = {}
    continue
elif k == 'HeadQuarters':
    D[k] = {}
    continue
elif k ==  'LA' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
elif k ==  'NY' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
else: 
    D[k]= v

The best way would be to use a standardized format like JSON instead of that particular string format you have there. — BrenBarn
– BrenBarn, Commented Jan 13, 2015 at 18:47
if you are sure that your code is correct then your question about the quality of your code might be on topic on codereview.stackexchange.com — jfs
– jfs, Commented Jan 13, 2015 at 19:08

jfs · Accepted Answer · 2015-01-13 19:36:25Z

1

You could use an existing yaml parser (PyYAML package):

import yaml # $ pip install pyyaml

data = yaml.safe_load(txt)

Result

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

The parser accepts your input as is but to make it more conformant yaml, it requires small modifications:

--- 
Country: USA
HeadQuarters: 
  LA: LA
  NY: NY
cities: 
  LA: "Los Angeles"
  NY: "New York"
desgination: yyyy
name: xxxx

edited Jan 13, 2015 at 19:36

answered Jan 13, 2015 at 19:13

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Vijay Over a year ago

I'm trying this ...installing yaml package now.

Bhargav Rao Over a year ago

Sorry to ask, but can you add a link as to what exactly is YAML

Vijay Over a year ago

Wow.. This is perfect !!.. Thanks @ J.F. Sabastian

Vijay Over a year ago

@J.F.Sebastian , What's the modification that is required? can you pls brief me on this pls? the reason, is, with similar output, the same function doesn't seem to work in my case. I went through the documentation but could not figure out the modification required. can you pls enlighten me on this ?

jfs Over a year ago

@user596922: the modified data is included literally in the answer (added quotes and document marker). You could visit yamllint.com to check your input data. To see how yaml could look like, call yaml.dump(python_object, default_flow_style=False).

Farmer Joe · Accepted Answer · 2015-01-13 19:15:26Z

You can use the split method here, a little recursion for your sub-dictionaries, and an assumption that your sub-dictionaries start with a tab (\t) or four spaces:

def txt_to_dict(txt):
    data = {}
    lines = txt.split('\n')
    i = 0
    while i < len(lines):
        try:
            key,val = txt.split(':')
        except ValueError:
            # print "Invalid row format"
            i += 1
            continue
        key = key.strip()
        val = val.strip()
        if len(val) == 0:
            i += 1
            sub = ""
            while lines[i].startswith('\t') or lines[i].startswith('    '):
                  sub += lines[i] + '\n'
                  i += 1
            data[key] = txt_to_dict(sub[:-1])  # remove last newline character
        else:
            data[key] = val
            i += 1
    return data

And then you would just call it on your variable txt as:

>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}

Sample output shown above. Creates the sub-dictionaries properly.

Added some error handling.

Bhargav Rao · Accepted Answer · 2015-01-13 19:32:19Z

1

Not sure if pythonic

x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
    if not x[i][0].startswith('    '):
        if x[i][1] != '':
            d[x[i][0]] = x[i][1]
        else:
            t = x[i][0]
            tmp = {}
            i+=1
            while x[i][0].startswith('    '):
                tmp[x[i][0].strip()] = x[i][1]
                i+=1
            d[t] = tmp
print d

output

{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': '  NY', 'LA': '  LA'}}

edited Jan 13, 2015 at 19:32

answered Jan 13, 2015 at 18:52

Bhargav Rao

52.6k29 gold badges130 silver badges142 bronze badges

2 Comments

Bhargav Rao Over a year ago

@user596922 Do you mean to say that those with a tab in front are at a different level?

Farmer Joe Over a year ago

@user596922 See my answer, it accounts for sub-dictionaries.

martineau · Accepted Answer · 2015-01-13 20:14:09Z

1

This produces the same output as your code. It was arrived at primarily by refactoring what you had and applying a few common Python idioms.

txt = '''
name         : xxxx
desgination  : yyyy
cities       :
    LA       : Los Angeles
    NY       : New York
HeadQuarters :
    LA       :  LA
    NY       :  NY
Country      : USA
'''

D = {}                                                    # added to test code
for line in (line for line in txt.splitlines() if line):  #        "
    k, _, v = [s.strip() for s in line.partition(':')]    #        "

    if k in {'cities', 'HeadQuarters'}:
        D[k] = {}
        continue
    elif k in {'LA', 'NY'}:
        for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
            if k not in D[k2]:
                D[k2][k] = v
    else:
        D[k]= v

import pprint
pprint.pprint(D)

Output:

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

edited Jan 13, 2015 at 20:14

answered Jan 13, 2015 at 19:01

martineau

124k29 gold badges181 silver badges319 bronze badges

1 Comment

Vijay Over a year ago

i doubt if this solution would solve the problem because both Cities and Headquarters have same Keys. so they get overwritten. D[cities] and D[HeadQuarters] should again be a dictionary with corresponding K,V paris.

kilojoules · Accepted Answer · 2015-01-13 19:10:00Z

0

This works

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''
di = {}
for line in txt.split('\n'):
   if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()

print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}

edited Jan 13, 2015 at 19:10

answered Jan 13, 2015 at 19:01

kilojoules

10.2k21 gold badges84 silver badges157 bronze badges

2 Comments

Yossi Over a year ago

I'll call it anything but pythonic

Vijay Over a year ago

di['cities'] should again be a dictionary with its K,V set. in the above solution, di['cities' and di['Headquarters'] are empty.

Collectives™ on Stack Overflow

Converting string into dictionary - pythonic Way

5 Answers 5

Result

5 Comments

Comments

2 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Result

5 Comments

Comments

2 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related