I am trying to build a dataframe that combines individual dataframes of county-level high school enrollment projections generated in a for loop.
I can do this for a single county, based on this SO question. It works great. My goal now is to do a nested for loop that would take multiple county FIPS codes, filter the inner loop on that, and generate an 11-row dataframe that would then be appended to a master dataframe. For three counties, for example, the final dataframe would be 33 rows.
But I haven't been able to get it right. I've tried to model on this SO question and answer.
This is my starting dataframe:
df = pd.DataFrame({"year": ['2020_21', '2020_21','2020_21'],
"county_fips" : ['06019','06021','06023'] ,
"grade11" : [5000,2000,2000],
"grade12": [5200,2200,2200],
"grade11_chg": [1.01,1.02,1.03],
"grade11_12_ratio": [0.9,0.8,0.87]})
df
This is my code with the nested loops. My intent is to run through the county codes in the outer loop and the projection year calculations in the inner loop.
projection_years=['2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31']
for i in df['county_fips'].unique():
print(i)
grade11_change=df.iloc[0]['grade11_chg']
grade11_12_ratio=df.iloc[0]['grade11_12_ratio']
full_name=[]
for year in projection_years:
#print(year)
df_select=df[df['county_fips']==i]
lr = df_select.iloc[-1]
row = {}
row['year'] = year
row['county_fips'] = i
row = {}
row['grade11'] = int(lr['grade11'] * grade11_change)
row['grade12'] = int(lr['grade11'] * grade11_12_ratio)
df_select = df_select.append([row])
full_name.append(df_select)
df_final=pd.concat(full_name)
df_final=df_final[['year','county_fips','grade11','grade12']]
print('Finished processing')
But I end up with NaN values and repeating years. Below shows my desired output (I built this in Excel and the numbers reflect rounding. (Update - this corrects the original df_final_goal .)
df_final_goal=pd.DataFrame({'year': {0: '2020_21', 1: '2021_22', 2: '2022_23', 3: '2023_24', 4: '2024_25', 5: '2025_26',
6: '2026_27', 7: '2027_28', 8: '2028_29', 9: '2029_30', 10: '2030_31', 11: '2020_21', 12: '2021_22', 13: '2022_23',
14: '2023_24', 15: '2024_25', 16: '2025_26', 17: '2026_27', 18: '2027_28', 19: '2028_29', 20: '2029_30', 21: '2030_31',
22: '2020_21', 23: '2021_22', 24: '2022_23', 25: '2023_24', 26: '2024_25', 27: '2025_26', 28: '2026_27', 29: '2027_28',
30: '2028_29', 31: '2029_30', 32: '2030_31'},
'county_fips': {0: '06019', 1: '06019', 2: '06019', 3: '06019', 4: '06019', 5: '06019', 6: '06019', 7: '06019', 8: '06019',
9: '06019', 10: '06019', 11: '06021', 12: '06021', 13: '06021', 14: '06021', 15: '06021', 16: '06021', 17: '06021', 18: '06021',
19: '06021', 20: '06021', 21: '06021', 22: '06023', 23: '06023', 24: '06023', 25: '06023', 26: '06023', 27: '06023',
28: '06023', 29: '06023', 30: '06023', 31: '06023', 32: '06023'},
'grade11': {0: 5000, 1: 5050, 2: 5101, 3: 5152, 4: 5203, 5: 5255, 6: 5308, 7: 5361, 8: 5414, 9: 5468, 10: 5523,
11: 2000, 12: 2040, 13: 2081, 14: 2122, 15: 2165, 16: 2208, 17: 2252, 18: 2297, 19: 2343, 20: 2390, 21: 2438,
22: 2000, 23: 2060, 24: 2122, 25: 2185, 26: 2251, 27: 2319, 28: 2388, 29: 2460, 30: 2534, 31: 2610, 32: 2688},
'grade12': {0: 5200, 1: 4500, 2: 4545, 3: 4590, 4: 4636, 5: 4683, 6: 4730, 7: 4777, 8: 4825, 9: 4873, 10: 4922,
11: 2200, 12: 1600, 13: 1632, 14: 1665, 15: 1698, 16: 1732, 17: 1767, 18: 1802, 19: 1838, 20: 1875, 21: 1912,
22: 2200, 23: 1740, 24: 1792, 25: 1846, 26: 1901, 27: 1958, 28: 2017, 29: 2078, 30: 2140, 31: 2204, 32: 2270}})
Thanks for any assistance.
county_fips(is itcounty_code?),grade11_chgorgrade11_12_ratiocolumns. These 3 columns are used in the latter piece of code. Also note that theprojection_yearsvalues are not all present in test dataframedf_final_goalyou havegrade11equaling values that don't match the output of your for-loop. E.g.5000 * 1.01 != 6079and5200 * 0.9 != 5417. In fact the first row per[(year, county_fips)]group should be equal to your originaldfbut they aren't. Is something off with the final? Or the original?