1

I am working on a survey and the data looks like this:

ID    Q1    Q2    Q3    Gender    Age    Dept
001   Y      N    Y      F         22     IT
002   N      Y    Y      M         35     HR
003   Y      N    N      F         20     IT
004   Y      N    Y      M         54     OPRE
005   Y      N    Y      M         42     OPRE

The codes are:

out = df.pivot_table(index='Q1', columns=['Gender'], values=['ID'], aggfunc='count', fill_value=0)
out = (out.join(out[['ID']].div(out['ID'].sum(axis=1).values, axis=0)
            .mul(100)
            .rename(columns={'ID':'%Respondents'})))
out

And the table I created is like this:

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

But I'd like all the Questions results be at one execution, like

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

Q2      #Res        %Rep
       M    F      M    F
Y      1    0      100  0
N      2    2      50   50

Q3      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      0    1      0    100

I want to write a function of creating the tables, and use a for loop to go over the questions. Can anyone help?

3
  • can you describe your query? Commented Jun 24, 2022 at 18:34
  • added a solution, does that help answer? Commented Jun 24, 2022 at 19:41
  • Oh, yes. I made some corrections. Sorry, I marked it by mistake. Commented Jun 27, 2022 at 19:43

2 Answers 2

1

Here is one way to do it


df2=df[['ID','Gender','Q1','Q2','Q3']].melt(
    ['ID','Gender'], var_name='question', value_name='response'
).pivot_table(
    index=['question','response'], columns='Gender', values='ID', aggfunc='count' ).fillna(0)

df2['%Rep_F'] = df2['F'] /(df2['M']+df2['F'] )*100
df2['%Rep_M'] = df2['M'] /(df2['M']+df2['F'] )*100
df2.sort_values(['question','response'], ascending=[1,0]).astype({'F':int,'M':int})



           Gender    F  M   %Rep_F  %Rep_M
question  response              
   Q1       Y        2  2    50.0    50.0
            N        0  1     0.0   100.0
   Q2       Y        0  1     0.0   100.0
            N        2  2    50.0    50.0
   Q3       Y        1  3    25.0    75.0
            N        1  0   100.0     0.0


Sign up to request clarification or add additional context in comments.

2 Comments

Excellent. Maybe slap .astype({'F':int,'M':int}) on the tail-end of that.
good point @jch. just updated the solution. Regards
0

You could use this function:

def aggregate_over_questions( df ):
    data = pd.DataFrame()
    #get the number of questions
    questions = df.loc[:,df.columns.str.contains("^Q")].columns.tolist()

    for q in questions:
        temp = df.pivot_table(index=f"{q}", columns=['Gender'], values['ID'], aggfunc='count', fill_value=0)
        temp = (temp.join(out[['ID']].div(temp['ID'].sum(axis=1).values, axis=0)
                    .mul(100)
                    .rename(columns={'ID':'%Respondents'})))
       data = pd.concat( [data, temp], axis=0 )
    return data

Image

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.