0

I have a dataframe with shape(1000, 200).

Having 1000 rows and 200 columns, how do i find the most frequent value in each row and add the value to a new column.

I want to exclude first 5 columns from the final result.

The code:

      df['Mode'] = df.mode(axis=1).iloc[:, 0]

does not work as required and includes all the columns.

2 Answers 2

1
df  = pd.DataFrame({"Bye":[1,42,35,5],"c":[1,2,3,3],"d":[1,2,6,3],"f":[1,2,3,3],"e":[1,4,3,3]})
# output
    Bye c   d   f   e
0   1   1   1   1   1
1   42  2   2   2   4
2   35  3   6   3   3
3   5   3   3   3   3



df.iloc[:,2:].mode(axis=1)
#output
0
0   1
1   2
2   3
3   3

First locate the elements on which you want to extract/calculate the mode.

df.iloc[:,2:] means we are going to take all the rows from the df and only the columns starting from 2+. Then we apply the mode and select axis=1 to calculate over the rows.

Axis in pandas

enter image description here

Multiple Modes

df77  = pd.DataFrame({"Bye":[1,42,35,5],"c":[1,42,3,3],"d":[2,7,6,3],"f":[2,7,3,3],"e":[3,4,3,3]})
df77

    Bye c   d   f   e
0   1   1   2   2   3
1   42  42  7   7   4
2   35  3   6   3   3
3   5   3   3   3   3

all_modes = df77.mode(axis=1)
all_modes.columns = ["mode1","mode2"] #renaming the modes because some rows had multiple ones

    mode1   mode2
0   1.0      2.0
1   7.0     42.0
2   3.0      NaN # indicates only 1 mode found
3   3.0      NaN


pd.concat([df77,all_modes],axis=1)

    Bye c   d   f   e   mode1   mode2
0   1   1   2   2   3    1.0    2.0
1   42  42  7   7   4    7.0    42.0
2   35  3   6   3   3    3.0    NaN
3   5   3   3   3   3    3.0    NaN
Sign up to request clarification or add additional context in comments.

17 Comments

Thanks @ombk for the answer. Still when i start it from 4+ columns and run the code with my data, i am not getting a single result but 5 columns starting from 0 until 4. Remember there are also NAN values in the rows.
this is not possible, you are probably writing it wrong.
@codingXP yes, as i said in my previous comment, you are not writing the code correctly. I applied the answer of the guy you accepted his answer, which is obviously same as mind but with more complication, and it gets the same result
This is what i am writing print(df.iloc[:,5:].mode(axis=1)) to start from column no. 6. In the first column of the results, i believe i have the result but why its also showing other columns ?
Total 5 columns in the result when i print it. I think the rest of the columns are showing only NAN values. may be if i put dropna = True in the .mode() then it might remove the rest of the columns or may be numeric_only to true.
|
1

You need value_counts().idxmax() and make sure axis=1

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(low=1,high=6,size=(5,35)), columns=range(35))
print(df)
df['freq'] = df.apply((lambda x: (x[5:].mode())), axis=1)
print(df)

Output:

   0   1   2   3   4   5   6   7   8   9   10  11  12  13  ...  21  22  23  24  25  26  27  28  29  30  31  32  33  34
0   1   4   4   1   1   5   1   2   3   2   1   1   2   2  ...   1   5   3   2   2   1   2   5   5   4   4   4   2   3
1   5   1   1   4   5   2   3   4   1   2   4   5   2   3  ...   2   5   1   5   3   4   1   5   5   3   2   4   1   3
2   5   2   1   3   1   2   2   5   5   4   5   5   1   2  ...   3   3   5   5   1   4   2   4   3   2   2   4   3   3
3   4   4   2   2   3   4   5   1   3   1   2   5   4   5  ...   3   4   5   3   3   5   2   1   5   1   1   4   4   3
4   5   3   1   5   2   4   2   5   3   4   1   3   4   1  ...   3   1   4   4   3   1   5   4   3   2   2   1   3   3

[5 rows x 35 columns]
   0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  ...  21  22  23  24  25  26  27  28  29  30  31  32  33  34  freq
0  1  4  4  1  1  5  1  2  3  2   1   1   2   2   1  ...   1   5   3   2   2   1   2   5   5   4   4   4   2   3     2
1  5  1  1  4  5  2  3  4  1  2   4   5   2   3   1  ...   2   5   1   5   3   4   1   5   5   3   2   4   1   3     5
2  5  2  1  3  1  2  2  5  5  4   5   5   1   2   2  ...   3   3   5   5   1   4   2   4   3   2   2   4   3   3     2
3  4  4  2  2  3  4  5  1  3  1   2   5   4   5   4  ...   3   4   5   3   3   5   2   1   5   1   1   4   4   3     5
4  5  3  1  5  2  4  2  5  3  4   1   3   4   1   1  ...   3   1   4   4   3   1   5   4   3   2   2   1   3   3     3

[5 rows x 36 columns]

2 Comments

@aaj kaal, basically ur method is wrong and let me tell u why... your method doesnt take into account if a row has multiple modes, it just takes the last most frequent element... it is sad it was chosen as answer and got 2 upvotes
@ombk thanks for bringing it up. Corrected it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.