1

I am trying to count the number of True/False values in a data frame like this:

df = pd.DataFrame({'a': [True, False, True],
                  'b': [True, True, True],
                  'c': [False, False, True]})
count_cols = ['a', 'b', 'c']
df['count'] = df[df[count_cols] == True].count(axis=1)

enter image description here

This is working fine on this example. But when I test it on my actual df (shape - (25168, 303)), I am getting the following error:

I Understood from - What does `ValueError: cannot reindex from a duplicate axis` mean? - that this usually occurs when there are duplicate values in the index and I have tried both df.reindex() and df[~df.index.duplicated()], but I am still getting the same error message.

2
  • 3
    have you tried df.sum(axis=1) ? Commented Sep 9, 2019 at 10:07
  • Thanks, but that throws the same error Commented Sep 9, 2019 at 10:57

1 Answer 1

1

Filter columns by list and count Trues values by sum - Trues are processing like 1s:

df['count'] = df[count_cols].sum(axis=1)
print (df)
       a     b      c  count
0   True  True  False      2
1  False  True  False      1
2   True  True   True      3

EDIT: For avoid error one possible solution is convert values to numpy array:

df['count'] = np.sum(df[count_cols].values, axis=1)
print (df)
       a     b      c  count
0   True  True  False      2
1  False  True  False      1
2   True  True   True      3
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, but df[count_cols].sum(axis=1) throws the same error
Perfect, thank you! Do you have any idea why mine works on the test df but not the big one?
@Maverick It seems some data related issue, hard to know

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.