pandas: summing over multiple columns

Question

Consider this dataframe:

    STUDENT    T_1  T_2  T_3  T_4
0    A         PASS FAIL PASS FAIL
1    B         PASS FAIL FAIL FAIL
2    C         FAIL FAIL PASS PASS
3    D         PASS FAIL PASS PASS

The columns T_1 -> T_4 represent tests. In this case, T_1 and T_3 are tests of type 'X', and T_2 and T_4 are tests of type 'Y'. The columns are categorical values. I want to get a % distribution per test type (i.e., X/Y). So I want this:

    STATUS   X             Y 
0    PASS    0.75 (6/8)   0.25 (2/8)
1    FAIL    0.25 (2/8)   0.75 (6/8)

I know I can use s.value_counts() / s.count() on a series to get the % status distribution per column, but how do I aggregate over multiple columns (i.e., combine T_1/T_3, T_2/T_4 since I know they belong to a particular test type)

Jianxun Li · Accepted Answer · 2015-06-17 00:38:24Z

Here is one way to do this.

import pandas as pd
import numpy as np

# just try to simulate your data
student_id = np.array('A B C D E F G H I G'.split()).reshape(10, 1)
test_results = np.random.choice(['PASS', 'FAIL'], size=(10, 4), p=[0.7, 0.3])
data = np.concatenate([student_id, test_results], axis=1)
df = pd.DataFrame(data, columns=['STUDENT', 'T_1', 'T_2', 'T_3', 'T_4'])

# set index as student names
df.set_index('STUDENT', inplace=True)
# add multi-level index to columns
df.columns = pd.MultiIndex.from_tuples([('T_1', 'X'), ('T_2', 'Y'), ('T_3', 'X'), ('T_4', 'Y')])
# transpose the df, groupby X,Y
by = df.T.groupby(level=1)


def count_func(group):
    num_pass = (group.values == 'PASS').sum()
    num_fail = (group.values == 'FAIL').sum()
    pass_rate = '{:>3.2f}% ({}/{})'.format(num_pass/(num_pass + num_fail), num_pass, num_pass + num_fail)
    fail_rate = '{:>3.2f}% ({}/{})'.format(num_fail/(num_pass + num_fail), num_fail, num_pass + num_fail)

    return pd.Series({'PASS RATE': pass_rate, 'FAIL_RATE': fail_rate})


result = by.apply(count_func)

Out[5]: 
      FAIL_RATE      PASS RATE
X  0.25% (5/20)  0.75% (15/20)
Y  0.25% (5/20)  0.75% (15/20)

Collectives™ on Stack Overflow

pandas: summing over multiple columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related