Get data from Multi-index dataframe based on numpy array

Question

From the following dataframe:

dim_0 dim_1                                             
0     0       40.54  23.40  6.70  1.70  1.82  0.96  1.62
      1      175.89  20.24  7.78  1.55  1.45  0.80  1.44
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
1     0       21.38  24.00  5.90  1.60  2.55  1.50  2.36
      1      130.29  18.40  8.49  1.52  1.45  0.80  1.47
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
2     0        6.30  25.70  5.60  1.70  2.16  1.16  1.87    
      1       73.45  21.49  6.88  1.61  1.61  0.94  1.63
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
3     0       16.64  25.70  5.70  1.60  2.17  1.12  1.76
      1      125.89  19.10  7.52  1.43  1.44  0.78  1.40
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
4     0       41.38  24.70  5.60  1.50  2.08  1.16  1.85
      1        0.00   0.00  0.00  0.00  0.00  0.00  0.00
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
5     0      180.59  16.40  3.80  1.10  4.63  3.86  5.71
      1        0.00   0.00  0.00  0.00  0.00  0.00  0.00
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
6     0       13.59  24.40  6.10  1.70  2.62  1.51  2.36
      1      103.19  19.02  8.70  1.53  1.48  0.76  1.38
      2        0.00   0.00  0.00  0.00  0.00  0.00  0.00
7     0        3.15  24.70  5.60  1.50  2.14  1.22  2.00
      1       55.90  23.10  6.07  1.50  1.86  1.12  1.87
      2      208.04  20.39  6.82  1.35  1.47  0.95  1.67

How can I get only the rows from dim_01 that match the array [1 0 0 1 2 0 1 2]?

Desired result is:

 0      175.89  20.24  7.78  1.55  1.45  0.80  1.44
 1       21.38  24.00  5.90  1.60  2.55  1.50  2.36
 2        6.30  25.70  5.60  1.70  2.16  1.16  1.87
 3      125.89  19.10  7.52  1.43  1.44  0.78  1.40
 4        0.00   0.00  0.00  0.00  0.00  0.00  0.00
 5      180.59  16.40  3.80  1.10  4.63  3.86  5.71
 7      103.19  19.02  8.70  1.53  1.48  0.76  1.38
 8      208.04  20.39  6.82  1.35  1.47  0.95  1.67

I've tried using slicing, cross-section, etc but no success.

Thanks in advance for the help.

jezrael · Accepted Answer · 2020-09-10 08:15:47Z

Use MultiIndex.from_arrays and select by DataFrame.loc:

arr = np.array([1, 0, 0, 1, 2, 0, 1 ,2])

df = df.loc[pd.MultiIndex.from_arrays([df.index.levels[0], arr])]
print (df)
          2      3     4     5     6     7     8
0                                               
0 1  175.89  20.24  7.78  1.55  1.45  0.80  1.44
1 0   21.38  24.00  5.90  1.60  2.55  1.50  2.36
2 0    6.30  25.70  5.60  1.70  2.16  1.16  1.87
3 1  125.89  19.10  7.52  1.43  1.44  0.78  1.40
4 2    0.00   0.00  0.00  0.00  0.00  0.00  0.00
5 0  180.59  16.40  3.80  1.10  4.63  3.86  5.71
6 1  103.19  19.02  8.70  1.53  1.48  0.76  1.38
7 2  208.04  20.39  6.82  1.35  1.47  0.95  1.67

arr = np.array([1, 0, 0, 1, 2, 0, 1 ,2])
df = df.loc[pd.MultiIndex.from_arrays([df.index.levels[0], arr])].droplevel(1)
print (df)
        2      3     4     5     6     7     8
0                                             
0  175.89  20.24  7.78  1.55  1.45  0.80  1.44
1   21.38  24.00  5.90  1.60  2.55  1.50  2.36
2    6.30  25.70  5.60  1.70  2.16  1.16  1.87
3  125.89  19.10  7.52  1.43  1.44  0.78  1.40
4    0.00   0.00  0.00  0.00  0.00  0.00  0.00
5  180.59  16.40  3.80  1.10  4.63  3.86  5.71
6  103.19  19.02  8.70  1.53  1.48  0.76  1.38
7  208.04  20.39  6.82  1.35  1.47  0.95  1.67

yatu · Accepted Answer · 2020-09-10 08:07:20Z

1

I'd go with advanced indexing using Numpy:

l = [1, 0, 0, 1, 2, 0, 1, 2]

i,j = df.index.levels
ix = np.array(l)+np.arange(i.max()+1)*(j.max()+1)
pd.DataFrame(df.to_numpy()[ix])

       0      1     2     3     4     5     6
0  175.89  20.24  7.78  1.55  1.45  0.80  1.44
1   21.38  24.00  5.90  1.60  2.55  1.50  2.36
2    6.30  25.70  5.60  1.70  2.16  1.16  1.87
3  125.89  19.10  7.52  1.43  1.44  0.78  1.40
4    0.00   0.00  0.00  0.00  0.00  0.00  0.00
5  180.59  16.40  3.80  1.10  4.63  3.86  5.71
6  103.19  19.02  8.70  1.53  1.48  0.76  1.38
7  208.04  20.39  6.82  1.35  1.47  0.95  1.67

answered Sep 10, 2020 at 8:07

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Comments

Roman_N · Accepted Answer · 2020-09-10 08:11:34Z

1

Try the following code:

mask_array = [1 0 0 1 2 0 1 2]

df_first = pd.DataFrame() # < It's your first array > 

new_array = df_first[df_first['dim_1'].isin(mask_array)]

answered Sep 10, 2020 at 8:11

Roman_N

1949 bronze badges

Collectives™ on Stack Overflow

Get data from Multi-index dataframe based on numpy array

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related