1

Could you, please, help me to crack the calculation?

I have the following table:

enter image description here

What I need to do is to calculate the expected frequency as (row total * col total) / grand total

The expected result: enter image description here

I assume that I need to iterate through rows and columns. I have tried to do it with:

for i, row in df_dropped.iterrows():
for j, column in row.iteritems():
    data[row][column] = df_dropped.iloc[i, 3] * df_dropped.iloc[2, j]

The error appears: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

What am I doing wrong?

0

2 Answers 2

2

Use numpy.outer for outer product of last column and last row and divide by
scalar selected by loc to numpy array:

t = df.loc['col_sum', 'row_sum']
arr = np.outer(df['row_sum'], df.loc['col_sum']) / t

Then create DataFrame by contructor with indexing for remove last column ans row:

df1 = pd.DataFrame(arr[:-1, :-1], 
                   columns=df.columns[:-1],
                   index=df.index[:-1]).add_prefix('exp_')
print (df1)
   exp_satisfied  exp_neutral  exp_dissatisfied
0      24.605263    20.842105          9.552632
1     145.394737   123.157895         56.447368

Get new columns names:

cols = [item for x in df.columns[:-1] for item in (x, 'exp_' + x)]
print (cols)
['satisfied', 'exp_satisfied', 'neutral', 'exp_neutral', 'dissatisfied', 'exp_dissatisfied']

Join together by concat and reindex for expected ordering of columns:

df = pd.concat([df.iloc[:-1, :-1], df1], axis=1).reindex(columns=cols)
print (df)
   satisfied  exp_satisfied  neutral  exp_neutral  dissatisfied  \
0         30      24.605263       17    20.842105             8   
1        140     145.394737      127   123.157895            58   

   exp_dissatisfied  
0          9.552632  
1         56.447368  
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you, jezrael, it is beatifully simple
Just one more question. Here is the final result: def expected_frequency(data): """The function calculates expected frequency""" data['row_sum'] = data.sum(axis = 1) data.loc['col_sum'] = data.sum() t = data.loc['col_sum', 'row_sum'] arr = np.outer(data['row_sum'], data.loc['col_sum']) / float(t) data2 = pd.DataFrame(arr[:-1, :-1], columns = data.columns[:-1]).add_prefix('exp_') data = pd.concat([data.iloc[:-1, :-1], data2], axis = 1) return data expected_frequency(df_dropped). My questions, how to store the function table as a permanent table?
@eponkratova - yes?
Not sure if understand, do you think assign? df1 = expected_frequency(df) - apply function to DataFrame called df and assign to df1
Yeah...that was easy. I thought of creating an empty df and then, applying a function to it, if makes sense. It is good to be new, you could come up with completely ridiculous solutions.
1

Jezrael gave a great answer in which you are calculating the expected frequencies using numpy and pandas. You can also use the python statistical libary statsmodels to calculate these kinds of statistics.

For example to calculate a table of expected frequencies, you could do:

import statsmodels.api as sm
expected_values = sm.stats.Table(df).fittedvalues

More info on: statsmodels contingency tables

1 Comment

Yeah. I also feel that crosstab could work. Thank you for an one-line solution!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.