List of dictionaries from numpy array without for loop

Question

Is there a way to vectorize an operation that takes several numpy arrays and puts them into a list of dictionaries?

Here's a simplified example. The real scenario might involve more arrays and more dictionary keys.

import numpy as np
x = np.arange(10)
y = np.arange(10, 20)
z = np.arange(100, 110)

print [dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10)]

I might have thousands or hundreds of thousands of iterations in the xrange call. All the manipulation to create x, y, and z is vectorized (my example is not as simple as above). So, there's only 1 for loop left to get rid of, which I expect would result in huge speed ups.

I've tried using map with a function to create the dict and all sorts of other work arounds. It seems the Python for loop is the slow part (as usual). I'm sort of stuck to using dictionaries because of a pre-existing API requirement. However, solutions without dicts and record arrays or something would be interesting to see, but ultimately I don't think that will work with the existing API.

[dict(x=x_, y=y_, z=z_) for x_, y_, z_ in zip(x, y, z)] this is vectorised as far as pure Python goes. — Eli Korvigo
– Eli Korvigo, Commented Nov 3, 2016 at 9:50
Did you try with a list and dic comprehension ? is it too slow ? — MMF
– MMF, Commented Nov 3, 2016 at 9:50
I mean that the problem is maybe not the presence or absence of a loop but the construction of the list that causes repeated memory allocations as it grows. — Balzola
– Balzola, Commented Nov 3, 2016 at 10:15
@durden2.0 FYI, the for in listcomps have little to do with Python's general for. The former actually has a lower-level implementation and is faster than the later. — Eli Korvigo
– Eli Korvigo, Commented Nov 3, 2016 at 11:29

hpaulj · Accepted Answer · 2016-11-03 18:14:55Z

3

With your small example, I'm having trouble getting anything faster than the combination of list and dictionary comprehensions

In [105]: timeit [{'x':i, 'y':j, 'z':k} for i,j,k in zip(x,y,z)]
100000 loops, best of 3: 15.5 µs per loop
In [106]: timeit [{'key':{'x':i, 'y':j, 'z':k}} for i,j,k in zip(x,y,z)]
10000 loops, best of 3: 37.3 µs per loop

The alternatives that use array concatenation to join the arrays before partitioning are slower.

In [108]: timeit [{'x':x_, 'y':y_, 'z':z_} for x_, y_, z_ in np.column_stack((x,y,z))]
....
10000 loops, best of 3: 58.2 µs per loop

=======================

A structured array is easiest with recfunctions:

In [109]: from numpy.lib import recfunctions
In [112]: M=recfunctions.merge_arrays((x,y,z))
In [113]: M.dtype.names=['x','y','z']
In [114]: M
Out[114]: 
array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103),
       (4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107),
       (8, 18, 108), (9, 19, 109)], 
      dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')])
In [115]: M['x']
Out[115]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Time it much slower, but if you want to access all the x values at once, it's much better than fetching them from all the dictionaries.

np.rec.fromarrays((x,y,z),names=['x','y','z'])

produces a recarray with given names. About the same speed.

I could also construct an empty array of the right dtype and shape and copy the arrays to it. That's probably as fast as this merge but more complicated to describe.

I'd suggest optimizing the data structure for use/access rather than construction speed. Generally you construct it once, and use it many times.

============

In [125]: dt=np.dtype([('x',x.dtype),('y',y.dtype),('z',z.dtype)])
In [126]: xyz=np.zeros(x.shape,dtype=dt)
In [127]: xyz['x']=x; xyz['y']=y; xyz['z']=z
# or for n,d in zip(xyz.dtype.names, (x,y,z)): xyz[n] = d
In [128]: xyz
Out[128]: 
array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103),
       (4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107),
       (8, 18, 108), (9, 19, 109)], 
      dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')])

edited Nov 3, 2016 at 18:14

answered Nov 3, 2016 at 17:32

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Kasravnd Over a year ago

Using a structure array is a more proper way to go, but there is no need to use recfunctions.

hpaulj Over a year ago

I added the non-recfunctions version.

Kasravnd · Accepted Answer · 2016-11-03 10:37:40Z

2

Here is one (Num)?Pythonic way:

In [18]: names = np.array(['x', 'y', 'z'])
In [38]: map(dict, np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y, z)))))
Out[38]: 
[{'x': '0', 'y': '10', 'z': '100'},
 {'x': '1', 'y': '11', 'z': '101'},
 {'x': '2', 'y': '12', 'z': '102'},
 {'x': '3', 'y': '13', 'z': '103'},
 {'x': '4', 'y': '14', 'z': '104'},
 {'x': '5', 'y': '15', 'z': '105'},
 {'x': '6', 'y': '16', 'z': '106'},
 {'x': '7', 'y': '17', 'z': '107'},
 {'x': '8', 'y': '18', 'z': '108'},
 {'x': '9', 'y': '19', 'z': '109'}]

Also, note that if you don't need all of the dictionaries at once, you can simply create a generator and access to each item on demand.

(dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10))

If you want a nested dictionary, I suggest a list comprehension:

In [88]: inner = np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y))))

In [89]: [{'connection': d} for d in map(dict, inner)]
Out[89]: 
[{'connection': {'x': '0', 'y': '10'}},
 {'connection': {'x': '1', 'y': '11'}},
 {'connection': {'x': '2', 'y': '12'}},
 {'connection': {'x': '3', 'y': '13'}},
 {'connection': {'x': '4', 'y': '14'}},
 {'connection': {'x': '5', 'y': '15'}},
 {'connection': {'x': '6', 'y': '16'}},
 {'connection': {'x': '7', 'y': '17'}},
 {'connection': {'x': '8', 'y': '18'}},
 {'connection': {'x': '9', 'y': '19'}}]

edited Nov 3, 2016 at 10:37

answered Nov 3, 2016 at 9:56

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

10 Comments

durden2.0 Over a year ago

Nice! What about if I need to create nested dictionaries. For example, {'connection': {'x': x[ii], 'y': y[ii]}}. I realized I simplified my question a bit too much for my scenario.

Kasravnd Over a year ago

@durden2.0 You mean, you want a nested dictionary for all the items? and without z? or it's optional?

durden2.0 Over a year ago

Yes nested dictionary for all the items, z is required in my solution but doesn't really matter too much for the solution itself because the key is a nested dictionary without doing a python for loop.

Kasravnd Over a year ago

@durden2.0 List comprehension's for is not like pythons for loop, cause its iteration has been implemented in C. But if you want to do this in Numpy ater all, since you want a python object, your code needs to have interaction with the upper level (python) and this means you cant write a pure numpythonic code.

Divakar Over a year ago

@Kasramvd "List comprehension's for is not like pythons for loop, cause its iteration has been implemented in C", really? I wonder why they are slow then?. One more relevant link - stackoverflow.com/questions/22108488/…

|

Divakar · Accepted Answer · 2016-11-03 09:53:13Z

1

Here's an approach using a mix of NumPy and Pandas -

# Stack into columns & create a pandas dataframe with appropriate col names
a = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
df = pd.DataFrame(a,columns=[['x','y','z']])

# Convert to list of dicts
out = df.T.to_dict().values()

Sample run -

In [52]: x
Out[52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [53]: y
Out[53]: array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [54]: z
Out[54]: array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

In [55]: out
Out[55]: 
[{'x': 0, 'y': 10, 'z': 100},
 {'x': 1, 'y': 11, 'z': 101},
 {'x': 2, 'y': 12, 'z': 102},
 {'x': 3, 'y': 13, 'z': 103},
 {'x': 4, 'y': 14, 'z': 104},
 {'x': 5, 'y': 15, 'z': 105},
 {'x': 6, 'y': 16, 'z': 106},
 {'x': 7, 'y': 17, 'z': 107},
 {'x': 8, 'y': 18, 'z': 108},
 {'x': 9, 'y': 19, 'z': 109}]

answered Nov 3, 2016 at 9:53

Divakar

222k19 gold badges273 silver badges374 bronze badges

2 Comments

durden2.0 Over a year ago

Clever solution! I simplified my setup, but in reality I'm creating a nested dictionary, so I'll try to tweak this a bit. For example, the list of dicts I'm returning is actually like [{{'connection': {'xy': x[ii], 'yy': y[ii]}}].

Divakar Over a year ago

@durden2.0 Could you edit the loop comprehension code in the question for that requirement?

Collectives™ on Stack Overflow

List of dictionaries from numpy array without for loop

3 Answers 3

2 Comments

10 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

10 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related