1

I have dataframes in a list as follows:

CGdfs = [CGdf_2002, CGdf_2003, CGdf_2004, CGdf_2005, CGdf_2006, CGdf_2007, CGdf_2008, CGdf_2009, CGdf_2010, CGdf_2011, CGdf_2012, CGdf_2013, CGdf_2014]

Columns in each dataframe are:

CGdf_2002 has columns: TSR_df_03_06, board_gender_diversity_percent, gics_sector_name, custom_region

CGdf_2003 has columns: TSR_df_04_07, board_gender_diversity_percent, gics_sector_name, custom_region

CGdf_2014 has columns: TSR_df_15_18, board_gender_diversity_percent, gics_sector_name, custom_region ...

I have the TSR columns in a list too

TSR3yrdfs_string = ['TSR_df_03_06', 'TSR_df_04_07', 'TSR_df_05_08', 'TSR_df_06_09', 'TSR_df_07_10', 'TSR_df_08_11', 'TSR_df_09_12', 'TSR_df_10_13','TSR_df_11_14', 'TSR_df_12_15','TSR_df_13_16','TSR_df_14_17', 'TSR_df_15_18']

I want to run regressions on all these dataframes in a loop with the following formula:

sm.ols(formula = TSR_df_03_06 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2002).fit()

sm.ols(formula = TSR_df_04_07 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2003).fit()

sm.ols(formula = TSR_df_05_08 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2004).fit()

These are different formulae for each dataframe. I want to run all these regressions upto CGdf_2014 in a loop.

Can someone give me a suggestion to achieve this?

I have tried the following but it says invalid syntax

CGdfs = [CGdf_2002, CGdf_2003, CGdf_2004, CGdf_2005, CGdf_2006, CGdf_2007, CGdf_2008, CGdf_2009, CGdf_2010, CGdf_2011, CGdf_2012, CGdf_2013, CGdf_2014, CGdf_2015, CGdf_2016, CGdf_2017, CGdf_2018]

TSR3yrdfs_string = ['TSR_df_03_06', 'TSR_df_04_07', 'TSR_df_05_08', 'TSR_df_06_09', 'TSR_df_07_10', 'TSR_df_08_11', 'TSR_df_09_12', 'TSR_df_10_13','TSR_df_11_14', 'TSR_df_12_15','TSR_df_13_16','TSR_df_14_17', 'TSR_df_15_18']  

for x, y in zip(CGdfs, TSR3yrdfs_string):
    results = sm.ols(formula = x[y] ~ x['board_gender_diversity_percent'] + x['gics_sector_name'] + x['custom_region'], data=x).fit()
    print('The summary of regression is:', results.summary())

1 Answer 1

1

You need to pass the formula as a string, but your formula has several lists, e.g. x[y], x['gics_sector_name'], ... and one element, that is not a char/string: ~.

But you can rewrite your formula like this (for better readability with a formula_str variable:

formula_str = y + '~' + 'board_gender_diversity_percent + gics_sector_name + custom_region'
results = sm.ols(formula=formula_str, data=x).fit()

y is a string inside your TSR3yrdfs_string list and your other colums just hard-coded as a single string.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.