3

This is more of a question about programming style. I scrap webpages for fields such as: "Temperature: 51 - 62", "Height: 1000-1500"...etc The results are saved in a dictionary

{"temperature": "51-62", "height":"1000-1500" ...... }

All key and values are string type. Every key can map to one of many possible values. Now I want to convert this dictionary to numpy array/vector. I have the following concerns:

  • Each key corresponds to one index position in the array.
  • Each possible string value is mapped to one integer.
  • For some dictionary, some keys are not available. For example, I also have a dictionary that has no "temperature" key, because that webpage doesn't contain such field.

I am wondering what is the most clear and efficient way of write such a conversion in Python. I am thinking of building another dictionary maps the key to the index number of the vector. And many other dictionaries that maps the values to integers.

Another problem I am having is I am not sure about the range of some keys. I want to dynamically keep track of the mapping between string values and integers. For example, I may find that key1 can map to a val1_8 in the future.

Thanks

2
  • possible duplicate of How to iterate over values in dictionary Python Commented May 14, 2014 at 23:50
  • @Anycorn, thanks for your prompt comment, My question is different from that post. Commented May 14, 2014 at 23:55

2 Answers 2

7

Try a pandas Series, it was built for this.

import pandas as pd
s = pd.Series({'a':1, 'b':2, 'c':3})
s.values # a numpy array
Sign up to request clarification or add additional context in comments.

3 Comments

One of the problem is not all dictionary have the same set of keys, Can pandas handle this? thanks
Yes. You may also want to check out pandas DataFrame for even more fun.
Thanks, I installed it, truly powerful tool. I did pd.DataFrame( {dd["name"]: pd.Series( dd) for dd in dictlist}) , where dictlist is a list of dictionaries.
1
>>> # a sequence of dictionaries in an interable called 'data'
>>> # assuming that not all dicts have the same keys
>>> pprint(data)
  [{'x': 7.0, 'y1': 2.773, 'y2': 4.5, 'y3': 2.0},
   {'x': 0.081, 'y1': 1.171, 'y2': 4.44, 'y3': 2.576},
   {'y1': 0.671, 'y3': 3.173},
   {'x': 0.242, 'y2': 3.978, 'y3': 3.791},
   {'x': 0.323, 'y1': 2.088, 'y2': 3.602, 'y3': 4.43}]

>>> # get the unique keys across entire dataset
>>> keys = [list(dx.keys()) for dx in data]

>>> # flatten and coerce to 'set'
>>> keys = {itm for inner_list in keys for itm in inner_list}

>>> # create a map (look-up table) from each key 
>>> # to a column in a NumPy array

>>> LuT = dict(enumerate(keys))
>>> LuT
  {'y2': 0, 'y3': 1, 'y1': 2, 'x': 3}

>>> idx = list(LuT.values())

>>> # pre-allocate NUmPy array (100 rows is arbitrary)
>>> # number of columns is len(LuT.keys())

>>> D = NP.empty((100, len(LuT.keys())))

>>> keys = list(LuT.keys())
>>> keys
  [0, 1, 2, 3]

>>> # now populate the array from the original data using LuT
>>> for i, row in enumerate(data):
        D[i,:] = [ row.get(LuT[k], 0) for k in keys ]

>> D[:5,:]
  array([[ 4.5  ,  2.   ,  2.773,  7.   ],
         [ 4.44 ,  2.576,  1.171,  0.081],
         [ 0.   ,  3.173,  0.671,  0.   ],
         [ 3.978,  3.791,  0.   ,  0.242],
         [ 3.602,  4.43 ,  2.088,  0.323]])

compare the last result (first 5 rows of D) with data, above

note that the ordering is preserved for each row (a single dictionary) with a less-than-complete set of keys--in other words, column 2 of D always corresponds to the values keyed to y2,, etc., even if the given row in data has no values stored for that key; eg, look at the third row in data, which has only two key/value pairs, in the third row of D, the first and last column are both 0, these columns correspond to keys x and y2, which are in fact the two missing keys

1 Comment

Thanks for your detailed answer. I find Pandas to be a natural solution for my current problems.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.