How to use Python for loop to execute the function inside?

Question

I am working on a survey and the data looks like this:

ID    Q1    Q2    Q3    Gender    Age    Dept
001   Y      N    Y      F         22     IT
002   N      Y    Y      M         35     HR
003   Y      N    N      F         20     IT
004   Y      N    Y      M         54     OPRE
005   Y      N    Y      M         42     OPRE

The codes are:

out = df.pivot_table(index='Q1', columns=['Gender'], values=['ID'], aggfunc='count', fill_value=0)
out = (out.join(out[['ID']].div(out['ID'].sum(axis=1).values, axis=0)
            .mul(100)
            .rename(columns={'ID':'%Respondents'})))
out

And the table I created is like this:

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

But I'd like all the Questions results be at one execution, like

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

Q2      #Res        %Rep
       M    F      M    F
Y      1    0      100  0
N      2    2      50   50

Q3      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      0    1      0    100

I want to write a function of creating the tables, and use a for loop to go over the questions. Can anyone help?

Oh, yes. I made some corrections. Sorry, I marked it by mistake. — Sandy
– Sandy, Commented Jun 27, 2022 at 19:43

Naveed · Accepted Answer · 2022-06-24 19:54:06Z

1

Here is one way to do it


df2=df[['ID','Gender','Q1','Q2','Q3']].melt(
    ['ID','Gender'], var_name='question', value_name='response'
).pivot_table(
    index=['question','response'], columns='Gender', values='ID', aggfunc='count' ).fillna(0)

df2['%Rep_F'] = df2['F'] /(df2['M']+df2['F'] )*100
df2['%Rep_M'] = df2['M'] /(df2['M']+df2['F'] )*100
df2.sort_values(['question','response'], ascending=[1,0]).astype({'F':int,'M':int})


           Gender    F  M   %Rep_F  %Rep_M
question  response              
   Q1       Y        2  2    50.0    50.0
            N        0  1     0.0   100.0
   Q2       Y        0  1     0.0   100.0
            N        2  2    50.0    50.0
   Q3       Y        1  3    25.0    75.0
            N        1  0   100.0     0.0

edited Jun 24, 2022 at 19:54

answered Jun 24, 2022 at 19:14

Naveed

11.7k2 gold badges16 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sitting_duck Over a year ago

Excellent. Maybe slap .astype({'F':int,'M':int}) on the tail-end of that.

Naveed Over a year ago

good point @jch. just updated the solution. Regards

sushant patil · Accepted Answer · 2022-06-24 19:26:40Z

0

You could use this function:

def aggregate_over_questions( df ):
    data = pd.DataFrame()
    #get the number of questions
    questions = df.loc[:,df.columns.str.contains("^Q")].columns.tolist()

    for q in questions:
        temp = df.pivot_table(index=f"{q}", columns=['Gender'], values['ID'], aggfunc='count', fill_value=0)
        temp = (temp.join(out[['ID']].div(temp['ID'].sum(axis=1).values, axis=0)
                    .mul(100)
                    .rename(columns={'ID':'%Respondents'})))
       data = pd.concat( [data, temp], axis=0 )
    return data

Image

edited Jun 24, 2022 at 19:26

answered Jun 24, 2022 at 19:21

sushant patil

185 bronze badges

Collectives™ on Stack Overflow

How to use Python for loop to execute the function inside?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related