3

I have a population data. I want to create separate dataframes for each state and year. The idea is the following:

for i in province_id:
    for j in year:
         sub_data_i_j = data[(data.provid==i) &(data.wave==j)]

However, I am not sure how to generate sub_data_i_j dynamically.

3 Answers 3

2

This should do it:

for i in province_id:
    for j in year:
        locals()['sub_data_{}_{}'.format(i,j)] = data[(data.provid==i) & (data.wave==j)]

I initially suggested using exec, which is not usually considered best practice for safety reasons. Having said so, if your code is not exposed to anyone with malicious intentions, it should be OK, and I'll leave it here for the sake of completeness:

for i in province_id:
    for j in year:
        exec "sub_data_{}_{} = data[(data.provid==i) & (data.wave==j)]".format(i,j)

Nevertheless, for most use cases, it's probably better to use a collection of some sort, e.g. a dictionary, because it will be cumbersome to reference dynamically generated variable names in subsequent parts of your code. It's also a one-liner:

data_dict = {key:g for key,g in data.groupby(['provid','wave'])}
Sign up to request clarification or add additional context in comments.

1 Comment

I agree. The second method is more pythonic. Thanks!
2

I think the best is create dictionary of DataFrames with groupby with filtering first by boolean indexing:

df = pd.DataFrame({'A':list('abcdef'),
                   'wave':[2004,2005,2004,2005,2005,2004],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'provid':list('aaabbb')})

print (df)
   A  C  D  E provid  wave
0  a  7  1  5      a  2004
1  b  8  3  3      a  2005
2  c  9  5  6      a  2004
3  d  4  7  9      b  2005
4  e  2  1  2      b  2005
5  f  3  0  4      b  2004


province_id = ['a','b']
year = [2004]
df = df[(df.provid.isin(province_id)) &(df.wave.isin(year))]
print (df)
   A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004
5  f  3  0  4      b  2004

dfs = {'{0[0]}_{0[1]}'.format(i) : x for i, x in df.groupby(['provid','wave'])}

Another solution:

dfs = dict(tuple(df.groupby(df['provid'] + '_' + df['wave'].astype(str))))

print (dfs)
{'a_2004':    A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004, 'b_2004':    A  C  D  E provid  wave
5  f  3  0  4      b  2004}

Last you can select each DataFrame:

print (dfs['b_2004'])
   A  C  D  E provid  wave
5  f  3  0  4      b  2004

Your answer should be changed by:

sub_data = {}
province_id = ['a','b']
year = [2004]
for i in province_id:
    for j in year:
         sub_data[i + '_' + str(j)] = df[(df.provid==i) &(df.wave==j)]

print (sub_data)
{'a_2004':    A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004, 'b_2004':    A  C  D  E provid  wave
5  f  3  0  4      b  2004}

3 Comments

And by the time I post you already got a big answer... nice +1
I mean, by the time I finished my "answer" you already got an answer with examples and "other solutions". You are quick
Thanks for the detailed answer!
1

My suggestion:

import io
import pandas as pd
from collections import defaultdict

string = u"""province_id,wave,value
1,2014,10
1,2014,10
1,2013,10
2,2010,10
3,2010,10"""

df = pd.read_csv(io.StringIO(string))

# Output:
d = defaultdict(dict)

# This splits the dataframe by province_id and wave
dfs = df.groupby(["province_id","wave"])

# Loop through the dataframes and stucture them
for ind,df in dfs:
    d[ind[0]][ind[1]] = df

The resulting dictionary structure looks like this:

{
  "1": {
    "2013": "dataframe: 1 2013", 
    "2014": "dataframe: 1 2014"
  }, 
  "2": {
    "2010": "dataframe: 2 2010"
  }, 
  "3": {
    "2010": "dataframe: 3 2010"
  }
}

And you access the dataframes by e.g.:

d[1][2013]

1 Comment

Thanks for introducing the defaultdict class.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.