1

This is a relative question of the post How to extract rows from an numpy array based on the content?, and I used the following code to split rows based on the content in the column:

np.split(sorted_a,np.unique(sorted_a[:,1],return_index=True)[1][1:])

the code worked fine, but later I tried the code to split other cases (as below), I found that there could be wrong results (as showed in CASE#1).

CASE#1
[[2748309, 246211, 1],
 [2748309, 246211, 2],
 [2747481, 246201, 54]]
OUTPUT#1
[]
[[2748309, 246211, 1],
 [2748309, 246211, 2],
 [2747481, 246201, 54]]
the result I want
[[2748309, 246211, 1],
 [2748309, 246211, 2]]
[[2747481, 246201, 54]]

I think the code may successfully split rows only in the case with little numbers, which with less digits, and I don't know how to solve problems showed in CASE#1 above. So in this post, I have 2 little relative questions:

1. How to split rows with greater numbers in it? (as showed in CASE #1)?

2. How to handle (split) data with both cases including #1 rows with the same element in the second column, but different in the first, and #2 rows with the same element in the first column, but different in the second ? (That is, could python distinguish rows considering contents in both first and second columns simultaneously?)

Feel free to give me suggestions, thank you.

Update#1

The ravel_multi_index function could handle this kind of task with integer-arrays, but how to deal with arrays containing float?

3 Answers 3

1

Here's an approach considering pair of elements from each row as indexing tuples -

# Convert to linear index equivalents
lidx = np.ravel_multi_index(arr[:,:2].T,arr[:,:2].max(0)+1)

# Get sorted indices of lidx. Using those get shifting indices.
# Split along sorted input array along axis=0 using those.
sidx = lidx.argsort()
out = np.split(arr[sidx],np.unique(lidx[sidx],return_index=1)[1][1:])

Sample run -

In [34]: arr
Out[34]: 
array([[2, 7, 5],
       [3, 4, 6],
       [2, 3, 5],
       [2, 7, 7],
       [4, 4, 7],
       [3, 4, 6],
       [2, 8, 5]])

In [35]: out
Out[35]: 
[array([[2, 3, 5]]), array([[2, 7, 5],
        [2, 7, 7]]), array([[2, 8, 5]]), array([[3, 4, 6],
        [3, 4, 6]]), array([[4, 4, 7]])]

For a detailed info on converting group of elements as indexing tuple, please refer to this post.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the suggestion and the detailed link, the ravel_multi_index function could handle the array with integers, but I am wondering how to do the same job as the array with float because the function seems to work only with integers.
@Heinz In the first step to calculate lidx, use np.unique(a[:,:2],return_inverse=1)[1].reshape(-1,2) in place of arr[:,:2].
0

The numpy_indexed package (disclaimer: I am its author) contains functionality to efficiently perform these type of operations:

import numpy_indexed as npi
npi.group_by(a[:, :2]).split(a)

It has decent test coverage, so id be surprised if it tripped on your seemingly straightforward test case.

1 Comment

Thank you for the answer, I would download and test this numpy_indexed package, but I prefer to solve this problem just with python and numpy. Anyway, thank you.
0

If I apply that split line directly to your array I get your result, an empty array plus the original

In [136]: np.split(a,np.unique(a[:,1],return_index=True)[1][1:])
Out[136]: 
[array([], shape=(0, 3), dtype=int32), 
 array([[2748309,  246211,       1],
        [2748309,  246211,       2],
        [2747481,  246201,      54]])]

But if I first sort the array on the 2nd column, as specified in the linked answer, I get the desired answer - with the 2 arrays switched

In [141]: sorted_a=a[np.argsort(a[:,1])]
In [142]: sorted_a
Out[142]: 
array([[2747481,  246201,      54],
       [2748309,  246211,       1],
       [2748309,  246211,       2]])
In [143]: np.split(sorted_a,np.unique(sorted_a[:,1],return_index=True)[1][1:])
Out[143]: 
[array([[2747481,  246201,      54]]), 
 array([[2748309,  246211,       1],
        [2748309,  246211,       2]])]

1 Comment

Thanks for your answer, but how to handle the unsorted input array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.