I have a large array that I imported from a csv (np.recfromcsv) that I want to divide into smaller arrays by an ID column in said array.
For example my array(a) looks like:
[(842, 129826, 2018, 7246, '1/4/2009', 452, '1/4/2009', 452, '1/4/2009')
(863, 129827, 2018, 7246, '1/7/2009', 452, '1/7/2009', 452, '1/7/2009')
(890, 129828, 2019, 7246, '1/11/2009', 452, '1/11/2009', 452, '1/11/2009')
...,
(339, 131268, 1085, 4211, '12/1/2009', 220, '12/2/2009', 220, '12/1/2009')
(376, 131535, 1085, 4211, '12/8/2009', 220, '12/9/2009', 220, '12/8/2009')
(470, 131536, 1087, 4211, '12/28/2009', 220, '12/29/2009', 220, '12/28/2009')]
And I would like to split this into arrays based on the third column (2018, 2019, 1085, etc). I've been trying to find a way to use numpy's vsplit method using a list I generated of unique ID values (id_list = list(set(a['id']))), however I get the erorr: ValueError: vsplit only works on arrays of 2 or more dimensions. Which makes me think the np.recfromcsv tool doesn't generate dimensions properly. Should I be using a different import tool?
I have also tried doing this in a simple loop:
for e in id_list:
name = "id" + str(e)
name = a[a['id']==e]
But this generates an error: SyntaxError: can't assign to operator. I know the problem is the dynamic variable, but I see no other way to achieve this without overwriting the array for each ID.
I'd really appreciate advice on how to figure this out.