0

I have a df that looks like this

df
    sample  total   count   base    positive    negative    group
    A   698 696 A   347 349 group1
    A   698 0   C   0   0   group1
    A   698 1   G   0   1   group1
    A   698 0   T   0   0   group1
    A   698 0   N   0   0   group1
    A   698 22  +G  14  8   group1
    A   698 1   +GG 0   1   group1
    A   698 1   -A  1   0   group1
    A   707 1   A   1   0   group1
    B   707 1   C   1   0   group1
    B   707 693 G   340 353 group1
    B   707 0   T   0   0   group1
    B   707 0   N   0   0   group1
    B   707 1   +A  0   1   group1
    B   707 11  -G  6   5   group1
    B   707 1   -GG 0   1   group1
    A   814 804 A   417 387 group2
    A   814 2   C   1   1   group2
    A   814 7   G   2   5   group2
    A   814 1   T   1   0   group2
    A   814 0   N   0   0   group2
    A   814 225 +G  114 111 group2
    A   814 7   +GG 4   3   group2
    B   826 0   A   0   0   group2
    B   826 0   C   0   0   group2
    B   826 815 G   416 399 group2
    B   826 0   T   0   0   group2
    B   826 0   N   0   0   group2
    B   826 11  -G  6   5   group2

I would like to convert/pivot it to df with headers given below

expected output

group   sample  total   +A  +G  +GG -A  -G  -GG A   C   G   N   T
group1  A   698 0   22  1   1   0   0   696 0   0   0   0
group1  B   707 1   0   0   0   11  1   1   1   693 0   0
group2  A   814 0   225 7   0   0   0   804 2   7   0   1
group2  B   826 0   0   0   0   11  0   0   0   815 0   0

So far adopting code shown here pandas row values to column headers

Using code:

_df = df.groupby(['group', 'base']).size().unstack(fill_value=0).reset_index().rename_axis(None, axis=1)

Output

 _df
    group   +A  +G  +GG -A  -G  -GG A   C   G   N   T
0   group1  1   1   1   1   1   1   2   2   2   2   2
1   group2  0   1   1   0   1   0   2   2   2   2   2

I am not sure what inbuilt command does pandas have where i can actually get the number from original dataframe and not 0,1 also I miss the total column.

0

2 Answers 2

3

Use pivot table like this

_df = df.pivot_table(index=['group','sample','total'],columns=['base'],values='count').fillna(0)

Output:

base                  +A     +G  +GG   -A    -G  -GG      A    C      G    N  \
group  sample total                                                            
group1 A      698    0.0   22.0  1.0  1.0   0.0  0.0  696.0  0.0    1.0  0.0   
              707    0.0    0.0  0.0  0.0   0.0  0.0    1.0  0.0    0.0  0.0   
       B      707    1.0    0.0  0.0  0.0  11.0  1.0    0.0  1.0  693.0  0.0   
group2 A      814    0.0  225.0  7.0  0.0   0.0  0.0  804.0  2.0    7.0  0.0   
       B      826    0.0    0.0  0.0  0.0  11.0  0.0    0.0  0.0  815.0  0.0   

base                   T  
group  sample total       
group1 A      698    0.0  
              707    0.0  
       B      707    0.0  
group2 A      814    1.0  
       B      826    0.0  
Sign up to request clarification or add additional context in comments.

2 Comments

added expected output
Works like charm pandas is like a never ending tunnel so many comands
1
pd.pivot_table(_df, columns=['base'], values=['size'],index='index_name').fillna(0)

1 Comment

there is no column named size nor index name :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.