I have a df that looks like this
df
sample total count base positive negative group
A 698 696 A 347 349 group1
A 698 0 C 0 0 group1
A 698 1 G 0 1 group1
A 698 0 T 0 0 group1
A 698 0 N 0 0 group1
A 698 22 +G 14 8 group1
A 698 1 +GG 0 1 group1
A 698 1 -A 1 0 group1
A 707 1 A 1 0 group1
B 707 1 C 1 0 group1
B 707 693 G 340 353 group1
B 707 0 T 0 0 group1
B 707 0 N 0 0 group1
B 707 1 +A 0 1 group1
B 707 11 -G 6 5 group1
B 707 1 -GG 0 1 group1
A 814 804 A 417 387 group2
A 814 2 C 1 1 group2
A 814 7 G 2 5 group2
A 814 1 T 1 0 group2
A 814 0 N 0 0 group2
A 814 225 +G 114 111 group2
A 814 7 +GG 4 3 group2
B 826 0 A 0 0 group2
B 826 0 C 0 0 group2
B 826 815 G 416 399 group2
B 826 0 T 0 0 group2
B 826 0 N 0 0 group2
B 826 11 -G 6 5 group2
I would like to convert/pivot it to df with headers given below
expected output
group sample total +A +G +GG -A -G -GG A C G N T
group1 A 698 0 22 1 1 0 0 696 0 0 0 0
group1 B 707 1 0 0 0 11 1 1 1 693 0 0
group2 A 814 0 225 7 0 0 0 804 2 7 0 1
group2 B 826 0 0 0 0 11 0 0 0 815 0 0
So far adopting code shown here pandas row values to column headers
Using code:
_df = df.groupby(['group', 'base']).size().unstack(fill_value=0).reset_index().rename_axis(None, axis=1)
Output
_df
group +A +G +GG -A -G -GG A C G N T
0 group1 1 1 1 1 1 1 2 2 2 2 2
1 group2 0 1 1 0 1 0 2 2 2 2 2
I am not sure what inbuilt command does pandas have where i can actually get the number from original dataframe and not 0,1 also I miss the total column.