3

I have an array of the form :

[[ 1. ,    2.,     3.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.3,    3.3,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.2,    3.2,    2.,     3.2,    3.2,    4.2  ],
 [ 1.1,    2.1,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.3,    3.5,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.7,    3.2,    2.,     3.2,    3.2,    4.2  ],
 [ 1.3,    2.2,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.3,    3.6,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.8,    3.2,    2.,     3.2,    3.2,    4.2  ],
 [ 1.4,    2.3,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.3,    3.7,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.9,    3.2,    2.,     3.2,    3.2,    4.2  ],
 [ 1.5,    2.1,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.89,   2.3,    3.5,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.7,    3.2,    2.,     3.2,    3.231,  4.2  ],
 [ 1.9,    2.2,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.22,   3.6,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.8,    3.2,    2.,     3.66,   3.2,    4.2  ],
 [ 1.89,   2.3,    1.,     1.,     3.,     3.,     4.   ],
 [ 1.3,    2.99,   3.7,    3.,     3.3,    3.3,    4.3  ],
 [ 1.2,    2.9,    3.2,    2.,     3.34,   3.2,    4.2  ]]

I want to split this array into a number of subarrays based on the fourth column. I.e. I want one subarray whose fourth column is equal to 1, another one where the fourth column is equal to 2, etc. I do not know in advance what all possible values are there in fourth column.

For instance, the subarray corresponding to fourth column being 1 is :

[[ 1.     2.     3.     1.     3.     3.     4.   ],
 [ 1.1    2.1    1.     1.     3.     3.     4.   ],
 [ 1.3    2.2    1.     1.     3.     3.     4.   ],
 [ 1.4    2.3    1.     1.     3.     3.     4.   ],
 [ 1.5    2.1    1.     1.     3.     3.     4.   ],
 [ 1.9    2.2    1.     1.     3.     3.     4.   ],
 [ 1.89   2.3    1.     1.     3.     3.     4.   ]]
0

4 Answers 4

3

To make a list of arrays:

y = [x[x[:,3]==k] for k in np.unique(x[:,3])]
Sign up to request clarification or add additional context in comments.

Comments

3

You can do this in O(NlogN) time using numpy.argsort, numpy.array_split, numpy.diff and numpy.where:

>>> indices = np.argsort(arr[:, 3])
>>> arr_temp = arr[indices]
>>> np.array_split(arr_temp, np.where(np.diff(arr_temp[:,3])!=0)[0]+1)
[array([[ 1.  ,  2.  ,  3.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.89,  2.3 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.1 ,  2.1 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.9 ,  2.2 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.3 ,  2.2 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.5 ,  2.1 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ],
       [ 1.4 ,  2.3 ,  1.  ,  1.  ,  3.  ,  3.  ,  4.  ]]), array([[ 1.2  ,  2.8  ,  3.2  ,  2.   ,  3.66 ,  3.2  ,  4.2  ],
       [ 1.2  ,  2.7  ,  3.2  ,  2.   ,  3.2  ,  3.231,  4.2  ],
       [ 1.2  ,  2.9  ,  3.2  ,  2.   ,  3.2  ,  3.2  ,  4.2  ],
       [ 1.2  ,  2.9  ,  3.2  ,  2.   ,  3.34 ,  3.2  ,  4.2  ],
       [ 1.2  ,  2.8  ,  3.2  ,  2.   ,  3.2  ,  3.2  ,  4.2  ],
       [ 1.2  ,  2.7  ,  3.2  ,  2.   ,  3.2  ,  3.2  ,  4.2  ],
       [ 1.2  ,  2.2  ,  3.2  ,  2.   ,  3.2  ,  3.2  ,  4.2  ]]), array([[ 1.3 ,  2.3 ,  3.6 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.89,  2.3 ,  3.5 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.3 ,  2.3 ,  3.5 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.3 ,  2.22,  3.6 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.3 ,  2.3 ,  3.3 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.3 ,  2.99,  3.7 ,  3.  ,  3.3 ,  3.3 ,  4.3 ],
       [ 1.3 ,  2.3 ,  3.7 ,  3.  ,  3.3 ,  3.3 ,  4.3 ]])]

Comments

0

I turned @ashwini-chaudhary 's idea in a way that returns the indices of interest for later iteration. So I figured I would share it:

def split_idx_by_dim(dim_array):
    """Returns a sequence of arrays of indices of elements sharing the same value in dim_array"""
    idx = np.argsort(dim_array)
    sorted_cl_ids = dim_array[idx]
    split_idx = np.array_split(idx, np.where(np.diff(sorted_cl_ids) != 0)[0] + 1)
    return split_idx

Comments

-2

Look at the docs for splitting an array into multiple sub-arrays.

numpy.hsplit(ary, indices_or_sections)

Split an array into multiple sub-arrays horizontally (column-wise).

So say you have a 4x4 array A:

array([[  0.,   1.,   2.,   3.],
   [  4.,   5.,   6.,   7.],
   [  8.,   9.,  10.,  11.],
   [ 12.,  13.,  14.,  15.]])

split = numpy.hsplit(A,4) = 

[array([[  0.],
   [  4.],
   [  8.],
   [ 12.]]), array([[  1.],
   [  5.],
   [  9.],
   [ 13.]]), array([[  2.],
   [  6.],
   [ 10.],
   [ 14.]]), array([[  3.],
   [  7.],
   [ 11.],
   [ 15.]])]

1 Comment

sorry, wrong function. this one should be easy to use.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.