2

I have a 3 x 3 Numpy array:

X =  np.array([[  0.,   2.,   0.],
         [  0.,   0.,   0.],
         [  4., 22.,   0.]])

Where each location within array X corresponds to some relationship between the following variables:

[  a & a,   a & b,   a & c]
[  b & a,   b & b,   b & c]
[  c & a,   c & b,   c & c]

So in the example array X, the number 2 corresponds to data that describes something about the relationship between variables a and b.

Now, say if I want to run a condition on X like so:

X > 3, which generates the following:

array([[False, False, False],
       [False, False, False],
       [ True,  True, False]])

How do I then determine which variables in my a, b and c variable universe the True values correspond to? We know it is c & a and c & b but how do I pull this information out?

I thought maybe there is a way to assign names to fixed locations in a Numpy array?

I can do what I want as follows:

y =  np.array([[  'a',   'a',  'a'],
         [  'b',   'b',  'b'],
         [  'c',  'c', 'c']])

z =  np.array([[  'a',   'b',  'c'],
         [  'a',   'b',  'c'],
         [  'a',  'b', 'c']])

y[x>3]

array(['c', 'c'], dtype='<U1')

And:

z[x>3]

array(['a', 'b'], dtype='<U1')

And then I can group the first index values in the results above to get c & a followed by the second index values to get c & b.

I'm not very experienced with the Numpy ecosystem so its unclear to me whether there is a better way to do what I want to do?

4
  • 1
    have you tried pandas? Commented Dec 1, 2020 at 16:36
  • I know I can do this in pandas but my actual data set r several 2500 x 2500 arrays so 6 million records. Pandas likely not the best place to do this work given the data size and run time involved. Commented Dec 1, 2020 at 16:44
  • 1
    I just tried pandas with 10k by 10k and it's fine... generating the random data that size via numpy takes ~2 seconds, pandas takes ~5 milliseconds to turn it into a data frame Commented Dec 1, 2020 at 16:49
  • 2
    Pandas is basically a fancy numpy wrapper. Overhead of Pandas over Numpy is negligible. Commented Dec 1, 2020 at 16:54

1 Answer 1

1

Another way is to use just one single index and then index away with np.where:

labels = np.array(['a','b','c'])

idx = np.array(np.where(X>3)).T
labels[idx]

Output:

array([['c', 'a'],
       ['c', 'b']], dtype='<U1')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.