Split numpy array by unique values in column

Question

I have a large array that I imported from a csv (np.recfromcsv) that I want to divide into smaller arrays by an ID column in said array. For example my array(a) looks like:

[(842, 129826, 2018, 7246, '1/4/2009', 452, '1/4/2009', 452, '1/4/2009')
 (863, 129827, 2018, 7246, '1/7/2009', 452, '1/7/2009', 452, '1/7/2009')
 (890, 129828, 2019, 7246, '1/11/2009', 452, '1/11/2009', 452, '1/11/2009')
 ...,
 (339, 131268, 1085, 4211, '12/1/2009', 220, '12/2/2009', 220, '12/1/2009')
 (376, 131535, 1085, 4211, '12/8/2009', 220, '12/9/2009', 220, '12/8/2009')
 (470, 131536, 1087, 4211, '12/28/2009', 220, '12/29/2009', 220, '12/28/2009')]

And I would like to split this into arrays based on the third column (2018, 2019, 1085, etc). I've been trying to find a way to use numpy's vsplit method using a list I generated of unique ID values (id_list = list(set(a['id']))), however I get the erorr: ValueError: vsplit only works on arrays of 2 or more dimensions. Which makes me think the np.recfromcsv tool doesn't generate dimensions properly. Should I be using a different import tool?
I have also tried doing this in a simple loop:

for e in id_list:
    name = "id" + str(e)
    name = a[a['id']==e]

But this generates an error: SyntaxError: can't assign to operator. I know the problem is the dynamic variable, but I see no other way to achieve this without overwriting the array for each ID.

I'd really appreciate advice on how to figure this out.

Saullo G. P. Castro · Accepted Answer · 2013-07-10 18:51:33Z

1

To read a column from a recarray you do not pass the index, but the name, for example:

my_col = a['id']

So that your command will be:

id_list = list(set(a['id'])))

_{Just as an observation.
The recfromcsv() works properly. Each field in the structured array (or record array) works like a 1D-array. Maybe you could try using np.loadtxt() passing delimiter=',', which will return a 2D-array.}

edited Jul 10, 2013 at 18:51

answered Jul 10, 2013 at 18:08

Saullo G. P. Castro

59.4k28 gold badges191 silver badges244 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

AlmaThom Over a year ago

I got the same ValueError using np.loadtxt(), and test.shape reveals it is still a 1D-array: (1890,)

Saullo G. P. Castro Over a year ago

could you make it available somewhere in the web the a sample of your input file... it seems that something is wrong...

AlmaThom Over a year ago

I've removed some of the superfluous columns for this task and put it here: filedropper.com/samplea

Saullo G. P. Castro Over a year ago

your file is fine... I've updated the answer adding how to access a column from a recarray...

AlmaThom Over a year ago

That is my fault, I attempted to simplify my code for the question and replaced the column name with a number without thinking. The ID list is generating correctly, but I still can't split the array.

Collectives™ on Stack Overflow

Split numpy array by unique values in column

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related