3

I have a large array that I imported from a csv (np.recfromcsv) that I want to divide into smaller arrays by an ID column in said array. For example my array(a) looks like:

[(842, 129826, 2018, 7246, '1/4/2009', 452, '1/4/2009', 452, '1/4/2009')
 (863, 129827, 2018, 7246, '1/7/2009', 452, '1/7/2009', 452, '1/7/2009')
 (890, 129828, 2019, 7246, '1/11/2009', 452, '1/11/2009', 452, '1/11/2009')
 ...,
 (339, 131268, 1085, 4211, '12/1/2009', 220, '12/2/2009', 220, '12/1/2009')
 (376, 131535, 1085, 4211, '12/8/2009', 220, '12/9/2009', 220, '12/8/2009')
 (470, 131536, 1087, 4211, '12/28/2009', 220, '12/29/2009', 220, '12/28/2009')]

And I would like to split this into arrays based on the third column (2018, 2019, 1085, etc). I've been trying to find a way to use numpy's vsplit method using a list I generated of unique ID values (id_list = list(set(a['id']))), however I get the erorr: ValueError: vsplit only works on arrays of 2 or more dimensions. Which makes me think the np.recfromcsv tool doesn't generate dimensions properly. Should I be using a different import tool?
I have also tried doing this in a simple loop:

for e in id_list:
    name = "id" + str(e)
    name = a[a['id']==e]

But this generates an error: SyntaxError: can't assign to operator. I know the problem is the dynamic variable, but I see no other way to achieve this without overwriting the array for each ID.

I'd really appreciate advice on how to figure this out.

1 Answer 1

1

To read a column from a recarray you do not pass the index, but the name, for example:

my_col = a['id']

So that your command will be:

id_list = list(set(a['id'])))

Just as an observation. The recfromcsv() works properly. Each field in the structured array (or record array) works like a 1D-array. Maybe you could try using np.loadtxt() passing delimiter=',', which will return a 2D-array.

Sign up to request clarification or add additional context in comments.

5 Comments

I got the same ValueError using np.loadtxt(), and test.shape reveals it is still a 1D-array: (1890,)
could you make it available somewhere in the web the a sample of your input file... it seems that something is wrong...
I've removed some of the superfluous columns for this task and put it here: filedropper.com/samplea
your file is fine... I've updated the answer adding how to access a column from a recarray...
That is my fault, I attempted to simplify my code for the question and replaced the column name with a number without thinking. The ID list is generating correctly, but I still can't split the array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.