2

Suppose I have a dataframe like this

t = {'Tract_number': ['01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100', '01001020100'],
    'Year': [2019, 2014, 2015, 2016, 2017, 2018, 2011, 2020, 2010, 2009, 2012, 2013],
    'Median_household_income': [70625.0, 65800.0, 67356.0, 68750.0, 70486.0, 70385.0, 66953.0, 70257.0, 71278.0, 'nan', 65179.0, 65114.0], 
    'Total_Asian_Population': [2.0, 12.0, 12.0, 9.0, 22.0, 17.0, 0.0, 41.0, 0.0, 'nan', 0.0, 0.0],
    'Total_bachelors_degree': [205.0, 173.0, 166.0, 216.0, 261.0, 236.0, 139.0, 'nan', 170.0, 'nan', 156.0, 183.0], 
    'Total_graduate_or_professional_degree': [154.0, 149.0, 176.0, 191.0, 215.0, 174.0, 117.0, 'nan', 146.0, 'nan', 131.0, 127.0], 
    'Median_gross_rent': [749.0, 738.0, 719.0, 484.0, 780.0, 827.0, 398.0, 820.0, 680.0, 'nan', 502.0, 525.0]}
df_sample = pd.DataFrame(data=t)

Now suppose I wanted to make a dictionary that looked like this structure

A = {
    
    '01001020100': 
    {
        Median_household_income:
        {'2010': 11235, '2011': 13253 }
        Total_Asian_Population:
        {'2010': 1234, ...}
    }

}

How would I do this?

I was going about it like this

d = {'Tract_number': df_sample['Tract_number'].iloc[0]}
e = {
    'Median_household_income': pd.Series(df_sample.Median_household_income.values,index=df_sample.Year).to_dict(),
    'Total_Asian_Population': pd.Series(df_sample.Total_Asian_Population.values,index=df_sample.Year).to_dict(),
    'Total_bachelors_degree': pd.Series(df_sample.Total_bachelors_degree.values,index=df_sample.Year).to_dict(),
    'Total_graduate_or_professional_degree': pd.Series(df_sample.Total_bachelors_degree.values,index=df_sample.Year).to_dict(),
    'Median_gross_rent': pd.Series(df_sample.Total_bachelors_degree.values,index=df_sample.Year).to_dict()
}
f = {}
f[d['Tract_number']] = e
f

Then I would just sort of append e to d, but is there a more pythonic way of doing this? Any help is appreciated.

1
  • I assume you can do some bad-ass pandas groupby/set_index operations which "jezrael" most likely would post as an answer Commented Nov 7, 2022 at 5:37

1 Answer 1

2

With the dataframe you provided, here is one way to do it with Pandas groupby and MultiIndex.get_level_values, and median function from Python standard library's statistics module:

import pandas as pd
from statistics import median

df = (
    pd.DataFrame(data=t)
    .sort_values(["Tract_number", "Year"])
    .groupby(["Tract_number", "Year"])
    .agg({"Median_household_income": median, "Total_Asian_Population": sum})
)

A = {
    key: {
        "Median_household_income": df.loc[(key,), "Median_household_income"].to_dict(),
        "Total_Asian_Population": df.loc[(key,), "Total_Asian_Population"].to_dict(),
    }
    for key in [idx for idx in df.index.get_level_values(0).unique()]
}

Then:

print(A)
# Output
{
    "01001020100": {
        "Median_household_income": {
            2009: "nan",
            2010: 71278.0,
            2011: 66953.0,
            2012: 65179.0,
            2013: 65114.0,
            2014: 65800.0,
            2015: 67356.0,
            2016: 68750.0,
            2017: 70486.0,
            2018: 70385.0,
            2019: 70625.0,
            2020: 70257.0,
        },
        "Total_Asian_Population": {
            2009: "nan",
            2010: 0.0,
            2011: 0.0,
            2012: 0.0,
            2013: 0.0,
            2014: 12.0,
            2015: 12.0,
            2016: 9.0,
            2017: 22.0,
            2018: 17.0,
            2019: 2.0,
            2020: 41.0,
        },
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.