1

Experts,

I have written a program to convert the string into dictionary. I'm able to achieve the desired result but i doubt if this is a pythonic way. Would like to hear suggestions on the same.

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''

I have split using (:) and have stored in dictionary. Here Cities and HeadQuarters contains another dictionary for which i have written code like this.

if k == 'cities' : 
    D[k] = {}
    continue
elif k == 'HeadQuarters':
    D[k] = {}
    continue
elif k ==  'LA' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
elif k ==  'NY' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
else: 
    D[k]= v 
3
  • 4
    The best way would be to use a standardized format like JSON instead of that particular string format you have there. Commented Jan 13, 2015 at 18:47
  • or yaml Commented Jan 13, 2015 at 19:02
  • if you are sure that your code is correct then your question about the quality of your code might be on topic on codereview.stackexchange.com Commented Jan 13, 2015 at 19:08

5 Answers 5

1

You could use an existing yaml parser (PyYAML package):

import yaml # $ pip install pyyaml

data = yaml.safe_load(txt)

Result

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

The parser accepts your input as is but to make it more conformant yaml, it requires small modifications:

--- 
Country: USA
HeadQuarters: 
  LA: LA
  NY: NY
cities: 
  LA: "Los Angeles"
  NY: "New York"
desgination: yyyy
name: xxxx
Sign up to request clarification or add additional context in comments.

5 Comments

I'm trying this ...installing yaml package now.
Sorry to ask, but can you add a link as to what exactly is YAML
Wow.. This is perfect !!.. Thanks @ J.F. Sabastian
@J.F.Sebastian , What's the modification that is required? can you pls brief me on this pls? the reason, is, with similar output, the same function doesn't seem to work in my case. I went through the documentation but could not figure out the modification required. can you pls enlighten me on this ?
@user596922: the modified data is included literally in the answer (added quotes and document marker). You could visit yamllint.com to check your input data. To see how yaml could look like, call yaml.dump(python_object, default_flow_style=False).
1

You can use the split method here, a little recursion for your sub-dictionaries, and an assumption that your sub-dictionaries start with a tab (\t) or four spaces:

def txt_to_dict(txt):
    data = {}
    lines = txt.split('\n')
    i = 0
    while i < len(lines):
        try:
            key,val = txt.split(':')
        except ValueError:
            # print "Invalid row format"
            i += 1
            continue
        key = key.strip()
        val = val.strip()
        if len(val) == 0:
            i += 1
            sub = ""
            while lines[i].startswith('\t') or lines[i].startswith('    '):
                  sub += lines[i] + '\n'
                  i += 1
            data[key] = txt_to_dict(sub[:-1])  # remove last newline character
        else:
            data[key] = val
            i += 1
    return data

And then you would just call it on your variable txt as:

>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}

Sample output shown above. Creates the sub-dictionaries properly.

Added some error handling.

Comments

1

Not sure if pythonic

x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
    if not x[i][0].startswith('    '):
        if x[i][1] != '':
            d[x[i][0]] = x[i][1]
        else:
            t = x[i][0]
            tmp = {}
            i+=1
            while x[i][0].startswith('    '):
                tmp[x[i][0].strip()] = x[i][1]
                i+=1
            d[t] = tmp
print d

output

{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': '  NY', 'LA': '  LA'}}

2 Comments

@user596922 Do you mean to say that those with a tab in front are at a different level?
@user596922 See my answer, it accounts for sub-dictionaries.
1

This produces the same output as your code. It was arrived at primarily by refactoring what you had and applying a few common Python idioms.

txt = '''
name         : xxxx
desgination  : yyyy
cities       :
    LA       : Los Angeles
    NY       : New York
HeadQuarters :
    LA       :  LA
    NY       :  NY
Country      : USA
'''

D = {}                                                    # added to test code
for line in (line for line in txt.splitlines() if line):  #        "
    k, _, v = [s.strip() for s in line.partition(':')]    #        "

    if k in {'cities', 'HeadQuarters'}:
        D[k] = {}
        continue
    elif k in {'LA', 'NY'}:
        for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
            if k not in D[k2]:
                D[k2][k] = v
    else:
        D[k]= v

import pprint
pprint.pprint(D)

Output:

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

1 Comment

i doubt if this solution would solve the problem because both Cities and Headquarters have same Keys. so they get overwritten. D[cities] and D[HeadQuarters] should again be a dictionary with corresponding K,V paris.
0

This works

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''
di = {}
for line in txt.split('\n'):
   if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()

print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}

2 Comments

I'll call it anything but pythonic
di['cities'] should again be a dictionary with its K,V set. in the above solution, di['cities' and di['Headquarters'] are empty.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.