Split DataFrame into a dictionary of groups from multiple columns

Question

I have a dataframe like this:

     df = pd.DataFrame({
               'Client':['A','B','C','D','E'],  
               'Revenue':[100,120,50,40,30],  
               'FYoQ':['FY','Q','Q','Q','FY'],  
              'Quarter':[np.nan,1,3,4,np.nan],  
              'Year':[2017,2016,2015,2017,2016]
        })

How do I split the data frame to get a 2 dimensional dictionary dataframe
ds[year][quarter] for each year and quarter.

Right now I am able to do a 1 dimensional dictionary as follows:

   years=df['Year'].unique().tolist()  
   mc={elem:pd.DataFrame for elem in years}  

  for year in years:  
      mc[year]=df.loc[(df['Year']==year)]

This way I obtain a dictionary of dataframe mc[2015], mc[2016] etc.
And then I again have to apply the same thing to each of them.

I was hoping there would be a modification of the code:

  mc={elem:pd.DataFrame for elem in years}

to create a 2 dimensional (or even multi dimensional dictionary) at once, allowing for the splitting of data faster.

piRSquared · Accepted Answer · 2017-08-24 22:13:23Z

3

from collections import defaultdict

d = defaultdict(dict)
[d[y].setdefault(q, g) for (y, q), g in df.groupby(['Year', 'Quarter'])];
d = dict(d)

for y, v in d.items():
    print(y)
    for q, s in v.items():
        print('    ' + str(q))
        p = s.__repr__()
        p = '\n'.join(['        ' + l for l in p.split('\n')])
        print(p, '\n')

2015
    3.0
          Client FYoQ  Quarter  Revenue  Year
        2      C    Q      3.0       50  2015 

2016
    1.0
          Client FYoQ  Quarter  Revenue  Year
        1      B    Q      1.0      120  2016 

2017
    4.0
          Client FYoQ  Quarter  Revenue  Year
        3      D    Q      4.0       40  2017

edited Aug 24, 2017 at 22:13

answered Aug 24, 2017 at 22:03

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs95 Over a year ago

As a use note (to OP): A single level dictionary facilitates faster access than a nested one requiring two separate lookups.

piRSquared Over a year ago

Agreed! But OP did ask for 2-D. I view it as, this is what OP asked for. Your's is what OP needs.

Alhpa Delta Over a year ago

Thanks, let me try this out. Looks like you both agree , the other solution is better/faster.

cs95 · Accepted Answer · 2017-08-25 17:49:29Z

3

IIUC, You could set a multi-index using df.set_index, followed by a df.groupby call. Then, build your dictionary inside a dict comprehension:

dict_ = {i : g for i, g in df.set_index(['Year', 'Quarter']).groupby(level=[0, 1])}

for k in dict_:
    print(dict_[k])

             Client FYoQ  Revenue
Year Quarter                     
2016 1.0          B    Q      120


             Client FYoQ  Revenue
Year Quarter                     
2015 3.0          C    Q       50


             Client FYoQ  Revenue
Year Quarter                     
2017 4.0          D    Q       40

The keys are (year, quarter) tuples, which are very manageable.

To save to a CSV file, the last loop will need a .to_csv call:

for k in dict_:
    label = 'data{}Q{}'.format(map(str, k))
    dict_[k].to_csv(label)

edited Aug 25, 2017 at 17:49

answered Aug 24, 2017 at 21:49

cs95

406k106 gold badges744 silver badges797 bronze badges

4 Comments

Alhpa Delta Over a year ago

Thanks, let me try this out.

Alhpa Delta Over a year ago

Thanks this does work. I was wondering how do i modify the last for loop so that I can write pd.to_csv the various resulting files with the files getting names automatically like "data2015Q1.csv", "data2015Q2.csv", ...., "data2016Q4.csv"...

cs95 Over a year ago

@AlhpaDelta Edited. You'll need .to_csv.

Alhpa Delta Over a year ago

It gives an error "IndexError:tuple index out of range". It worked beautifully so far (thank you!). If I could get it to write back, that would be perfect, the actual data is several years and 4 quarters in each.

Collectives™ on Stack Overflow

Split DataFrame into a dictionary of groups from multiple columns

2 Answers 2

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related