Numpy: is there a way to provide a name to a location in a numpy array

Question

I have a 3 x 3 Numpy array:

X =  np.array([[  0.,   2.,   0.],
         [  0.,   0.,   0.],
         [  4., 22.,   0.]])

Where each location within array X corresponds to some relationship between the following variables:

[  a & a,   a & b,   a & c]
[  b & a,   b & b,   b & c]
[  c & a,   c & b,   c & c]

So in the example array X, the number 2 corresponds to data that describes something about the relationship between variables a and b.

Now, say if I want to run a condition on X like so:

X > 3, which generates the following:

array([[False, False, False],
       [False, False, False],
       [ True,  True, False]])

How do I then determine which variables in my a, b and c variable universe the True values correspond to? We know it is c & a and c & b but how do I pull this information out?

I thought maybe there is a way to assign names to fixed locations in a Numpy array?

I can do what I want as follows:

y =  np.array([[  'a',   'a',  'a'],
         [  'b',   'b',  'b'],
         [  'c',  'c', 'c']])

z =  np.array([[  'a',   'b',  'c'],
         [  'a',   'b',  'c'],
         [  'a',  'b', 'c']])

y[x>3]

array(['c', 'c'], dtype='<U1')

And:

z[x>3]

array(['a', 'b'], dtype='<U1')

And then I can group the first index values in the results above to get c & a followed by the second index values to get c & b.

I'm not very experienced with the Numpy ecosystem so its unclear to me whether there is a better way to do what I want to do?

I know I can do this in pandas but my actual data set r several 2500 x 2500 arrays so 6 million records. Pandas likely not the best place to do this work given the data size and run time involved. — codingknob
– codingknob, Commented Dec 1, 2020 at 16:44
I just tried pandas with 10k by 10k and it's fine... generating the random data that size via numpy takes ~2 seconds, pandas takes ~5 milliseconds to turn it into a data frame — Sam Mason
– Sam Mason, Commented Dec 1, 2020 at 16:49
Pandas is basically a fancy numpy wrapper. Overhead of Pandas over Numpy is negligible. — Quang Hoang
– Quang Hoang, Commented Dec 1, 2020 at 16:54

Quang Hoang · Accepted Answer · 2020-12-01 16:39:05Z

1

Another way is to use just one single index and then index away with np.where:

labels = np.array(['a','b','c'])

idx = np.array(np.where(X>3)).T
labels[idx]

Output:

array([['c', 'a'],
       ['c', 'b']], dtype='<U1')

answered Dec 1, 2020 at 16:39

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy: is there a way to provide a name to a location in a numpy array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related