5

For example, given:

import numpy as np
data = np.array(
    [[0, 0, 0],
    [0, 1, 1],
    [1, 0, 1],
    [1, 0, 1],
    [0, 1, 1],
    [0, 0, 0]])

I want to get a 3-dimensional array, looking like:

result = array([[[ 2.,  0.],
                 [ 0.,  2.]],

                [[ 0.,  2.],
                 [ 0.,  0.]]])

One way is:

for row in data
    newArray[ row[0] ][ row[1] ][ row[2] ] += 1

What I'm trying to do is the following:

for i in dimension1
   for j in dimension2
      for k in dimension3
          result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()

This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).

Thanks.

6
  • 2
    I believe you are using numpy. Commented Feb 6, 2014 at 18:37
  • Yes, that is correct, will fix it. Regardless, any ideas? Commented Feb 6, 2014 at 18:39
  • I think your first approach is easier to read and certainly faster. Commented Feb 6, 2014 at 18:45
  • @tobias_k I know! I'm just curious to see why the second approach isn't working :) Commented Feb 6, 2014 at 18:46
  • 5
    Please post syntactically correct code next time Commented Feb 6, 2014 at 18:48

4 Answers 4

4

You can also use numpy.histogramdd for this:

>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2.,  0.],
        [ 0.,  2.]],

       [[ 0.,  2.],
        [ 0.,  0.]]])
Sign up to request clarification or add additional context in comments.

Comments

2

The problem is that data[data[data[:,0]==i, 1]==j, 2]==k is not what you expect it to be.

Let's take this apart for the case (i,j,k) == (0,0,0)

data[:,0]==0 is [True, True, False, False, True, True], and data[data[:,0]==0] correctly gives us the lines where the first number is 0.

Now from those lines we get the lines where the second number is 0: data[data[:,0]==0, 1]==0, which gives us [True, False, False, True]. And this is the problem. Because if we take those indices from data, i.e., data[data[data[:,0]==0, 1]==0] we do not get the rows where the first and second number are 0, but the 0th and 3rd row instead:

In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
                [1, 0, 1]])

And if we now filter for the rows where the third number is 0, we get the wrong result w.r.t. the orignal data.

And that's why your approach does not work. For better methods, see the other answers.

Comments

2

You can do something like the following

#Get output dimension and construct output array.
>>> dshape = tuple(data.max(axis=0)+1)
>>> dshape
(2, 2, 2)
>>> out = np.zeros(shape)

If you have numpy 1.8+:

out.flat[np.ravel_multi_index(data.T, dshape)]+=1

Else:

#Get indices and unique the resulting array
>>> inds = np.ravel_multi_index(data.T, dshape)
>>> inds, inverse = np.unique(inds, return_inverse=True)
>>> values = np.bincount(inverse)

>>> values
array([2, 2, 2])

>>> out.flat[inds] = values
>>> out
array([[[ 2.,  0.],
        [ 0.,  2.]],

       [[ 0.,  2.],
        [ 0.,  0.]]])

Numpy versions before numpy 1.7 do not have a add.at attribute and the top code will not work without it. As ravel_multi_index may not be the fastest algorithm ever you can look into taking the unique rows of a numpy array. In effect these two operations should be equivalent.

Comments

1

Don't fear the imports. They're what make Python awesome.

If question assumes that you already have the result matrix.

import numpy as np
data = np.array(
    [[0, 0, 0],
     [0, 1, 1],
     [1, 0, 1],
     [1, 0, 1],
     [0, 1, 1],
     [0, 0, 0]]
)
result = np.zeros((2,2,2))

# range of each dim, aka allowable values for each dim
dim_ranges = zip(np.zeros(result.ndim), np.array(result.shape)-1)
dim_ranges
# Out[]:
#     [(0.0, 2), (0.0, 2), (0.0, 2)]

# Multidimentional histogram will effectively "count" along each dim
sums,_ = np.histogramdd(data,bins=result.shape,range=dim_ranges)
result += sums
result
# Out[]:
#     array([[[ 2.,  0.],
#             [ 0.,  2.]],
#
#            [[ 0.,  2.],
#             [ 0.,  0.]]])

This solution solves for any "result" ndarray, no matter what the shape. Additionally, it works fine even if your "data" ndarray has indices which are out-of-bounds for your result matrix.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.