0

How can I split a 2D array by a grouping variable, and return a list of arrays please (also the order is important).

To show expected outcome, the equivalent in R can be done as

> (A = matrix(c("a", "b", "a", "c", "b", "d"), nr=3, byrow=TRUE)) # input
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "d" 
> (split.data.frame(A, A[,1])) # output
$a
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 

$b
     [,1] [,2]
[1,] "b"  "d" 

EDIT: To clarify: I'd like to split the array/matrix, A into a list of multiple arrays based on the unique values in the first column. That is, split A into one array where the first column has an a, and another array where the first column has a b.

I have tried Python equivalent of R "split"-function but this gives three arrays

import numpy as np
import itertools
A = np.array([["a", "b"], ["a", "c"], ["b", "d"]])
b = a[:,0]

def split(x, f):
     return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
split(A, b) 

([array(['a', 'b'], dtype='<U1'),
  array(['a', 'c'], dtype='<U1'),
  array(['b', 'd'], dtype='<U1')],
 [])

And also numpy.split, using np.split(A, b), but which needs integers. I though I may be able to use How to convert strings into integers in Python? to convert the letters to integers, but even if I pass integers, it doesn't split as expected

c = np.transpose(np.array([1,1,2]))
np.split(A, c) # returns 4 arrays

Can this be done? thanks

EDIT: please note that this is a small example, and the number of groups may be greater than two and they may not be ordered.

2
  • Not sure I understand your expected output @user2957945 Commented Nov 14, 2018 at 18:45
  • okay, thanks @RafaelC -- I'll clarify Commented Nov 14, 2018 at 18:46

2 Answers 2

2

You can use pandas:

import pandas as pd
import numpy as np

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

listofdfs = {}
for n,g in pd.DataFrame(a).groupby(0):
    listofdfs[n] = g

listofdfs['a'].values

Output:

array([['a', 'b'],
       ['a', 'c']], dtype=object)

And,

listofdfs['b'].values

Output:

array([['b', 'd']], dtype=object)

Or, you could use itertools groupby:

import numpy as np
from itertools import groupby
l = [np.stack(list(g)) for k, g in groupby(a, lambda x: x[0])]

l[0]

Output:

array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

And,

l[1]

Output:

array([['b', 'd']], dtype='<U1')
Sign up to request clarification or add additional context in comments.

2 Comments

Great, thanks Scott, that looks good. I'd considered coercing to a dataframe but I thought there may be array tools -- but this is good.
brilliant, thank you very much. I'm trapesing through stackoverflow.com/questions/773/…, so your edit gives me the output for my understanding to work towards
0

If I understand your question, you can do simple slicing, as in:

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

x,y=a[:2,:],a[2,:]

x
array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

y
array(['b', 'd'], dtype='<U1')

3 Comments

Hi G.Anderson, thank you for your answer. This would fail for a = np.array([["a", "b"], ["b", "d"], ["a", "c"], ["b", "d"]]), or if there were more groups. Apologies maybe my example was to minimal.
I see. I answered before you edited about grouping the splits based on value. Perhaps this answer might help?
Thanks. That looks promising-- I'm just trying to tweak it to my example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.