How to exclude some columns from a pandas dataframe with python

Question

I have a dataframe with shape(1000, 200).

Having 1000 rows and 200 columns, how do i find the most frequent value in each row and add the value to a new column.

I want to exclude first 5 columns from the final result.

The code:

      df['Mode'] = df.mode(axis=1).iloc[:, 0]

does not work as required and includes all the columns.

ombk · Accepted Answer · 2020-12-06 11:07:59Z

1

df  = pd.DataFrame({"Bye":[1,42,35,5],"c":[1,2,3,3],"d":[1,2,6,3],"f":[1,2,3,3],"e":[1,4,3,3]})
# output
    Bye c   d   f   e
0   1   1   1   1   1
1   42  2   2   2   4
2   35  3   6   3   3
3   5   3   3   3   3



df.iloc[:,2:].mode(axis=1)
#output
0
0   1
1   2
2   3
3   3

First locate the elements on which you want to extract/calculate the mode.

df.iloc[:,2:] means we are going to take all the rows from the df and only the columns starting from 2+. Then we apply the mode and select axis=1 to calculate over the rows.

Axis in pandas

Multiple Modes

df77  = pd.DataFrame({"Bye":[1,42,35,5],"c":[1,42,3,3],"d":[2,7,6,3],"f":[2,7,3,3],"e":[3,4,3,3]})
df77

    Bye c   d   f   e
0   1   1   2   2   3
1   42  42  7   7   4
2   35  3   6   3   3
3   5   3   3   3   3

all_modes = df77.mode(axis=1)
all_modes.columns = ["mode1","mode2"] #renaming the modes because some rows had multiple ones

    mode1   mode2
0   1.0      2.0
1   7.0     42.0
2   3.0      NaN # indicates only 1 mode found
3   3.0      NaN


pd.concat([df77,all_modes],axis=1)

    Bye c   d   f   e   mode1   mode2
0   1   1   2   2   3    1.0    2.0
1   42  42  7   7   4    7.0    42.0
2   35  3   6   3   3    3.0    NaN
3   5   3   3   3   3    3.0    NaN

edited Dec 6, 2020 at 11:07

answered Dec 5, 2020 at 17:42

ombk

2,1091 gold badge6 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

codingXP Over a year ago

Thanks @ombk for the answer. Still when i start it from 4+ columns and run the code with my data, i am not getting a single result but 5 columns starting from 0 until 4. Remember there are also NAN values in the rows.

ombk Over a year ago

this is not possible, you are probably writing it wrong.

ombk Over a year ago

@codingXP yes, as i said in my previous comment, you are not writing the code correctly. I applied the answer of the guy you accepted his answer, which is obviously same as mind but with more complication, and it gets the same result

codingXP Over a year ago

This is what i am writing print(df.iloc[:,5:].mode(axis=1)) to start from column no. 6. In the first column of the results, i believe i have the result but why its also showing other columns ?

codingXP Over a year ago

Total 5 columns in the result when i print it. I think the rest of the columns are showing only NAN values. may be if i put dropna = True in the .mode() then it might remove the rest of the columns or may be numeric_only to true.

|

Aaj Kaal · Accepted Answer · 2020-12-06 23:23:27Z

1

You need value_counts().idxmax() and make sure axis=1

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(low=1,high=6,size=(5,35)), columns=range(35))
print(df)
df['freq'] = df.apply((lambda x: (x[5:].mode())), axis=1)
print(df)

Output:

   0   1   2   3   4   5   6   7   8   9   10  11  12  13  ...  21  22  23  24  25  26  27  28  29  30  31  32  33  34
0   1   4   4   1   1   5   1   2   3   2   1   1   2   2  ...   1   5   3   2   2   1   2   5   5   4   4   4   2   3
1   5   1   1   4   5   2   3   4   1   2   4   5   2   3  ...   2   5   1   5   3   4   1   5   5   3   2   4   1   3
2   5   2   1   3   1   2   2   5   5   4   5   5   1   2  ...   3   3   5   5   1   4   2   4   3   2   2   4   3   3
3   4   4   2   2   3   4   5   1   3   1   2   5   4   5  ...   3   4   5   3   3   5   2   1   5   1   1   4   4   3
4   5   3   1   5   2   4   2   5   3   4   1   3   4   1  ...   3   1   4   4   3   1   5   4   3   2   2   1   3   3

[5 rows x 35 columns]
   0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  ...  21  22  23  24  25  26  27  28  29  30  31  32  33  34  freq
0  1  4  4  1  1  5  1  2  3  2   1   1   2   2   1  ...   1   5   3   2   2   1   2   5   5   4   4   4   2   3     2
1  5  1  1  4  5  2  3  4  1  2   4   5   2   3   1  ...   2   5   1   5   3   4   1   5   5   3   2   4   1   3     5
2  5  2  1  3  1  2  2  5  5  4   5   5   1   2   2  ...   3   3   5   5   1   4   2   4   3   2   2   4   3   3     2
3  4  4  2  2  3  4  5  1  3  1   2   5   4   5   4  ...   3   4   5   3   3   5   2   1   5   1   1   4   4   3     5
4  5  3  1  5  2  4  2  5  3  4   1   3   4   1   1  ...   3   1   4   4   3   1   5   4   3   2   2   1   3   3     3

[5 rows x 36 columns]

edited Dec 6, 2020 at 23:23

answered Dec 5, 2020 at 18:07

Aaj Kaal

1,3421 gold badge10 silver badges9 bronze badges

2 Comments

ombk Over a year ago

@aaj kaal, basically ur method is wrong and let me tell u why... your method doesnt take into account if a row has multiple modes, it just takes the last most frequent element... it is sad it was chosen as answer and got 2 upvotes

Aaj Kaal Over a year ago

@ombk thanks for bringing it up. Corrected it

Collectives™ on Stack Overflow

How to exclude some columns from a pandas dataframe with python

2 Answers 2

17 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

17 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related