2

I have a dataframe like this:

     df = pd.DataFrame({
               'Client':['A','B','C','D','E'],  
               'Revenue':[100,120,50,40,30],  
               'FYoQ':['FY','Q','Q','Q','FY'],  
              'Quarter':[np.nan,1,3,4,np.nan],  
              'Year':[2017,2016,2015,2017,2016]
        })

How do I split the data frame to get a 2 dimensional dictionary dataframe
ds[year][quarter] for each year and quarter.

Right now I am able to do a 1 dimensional dictionary as follows:

   years=df['Year'].unique().tolist()  
   mc={elem:pd.DataFrame for elem in years}  

  for year in years:  
      mc[year]=df.loc[(df['Year']==year)]  

This way I obtain a dictionary of dataframe mc[2015], mc[2016] etc.
And then I again have to apply the same thing to each of them.

I was hoping there would be a modification of the code:

  mc={elem:pd.DataFrame for elem in years}  

to create a 2 dimensional (or even multi dimensional dictionary) at once, allowing for the splitting of data faster.

0

2 Answers 2

3
from collections import defaultdict

d = defaultdict(dict)
[d[y].setdefault(q, g) for (y, q), g in df.groupby(['Year', 'Quarter'])];
d = dict(d)

for y, v in d.items():
    print(y)
    for q, s in v.items():
        print('    ' + str(q))
        p = s.__repr__()
        p = '\n'.join(['        ' + l for l in p.split('\n')])
        print(p, '\n')

2015
    3.0
          Client FYoQ  Quarter  Revenue  Year
        2      C    Q      3.0       50  2015 

2016
    1.0
          Client FYoQ  Quarter  Revenue  Year
        1      B    Q      1.0      120  2016 

2017
    4.0
          Client FYoQ  Quarter  Revenue  Year
        3      D    Q      4.0       40  2017 
Sign up to request clarification or add additional context in comments.

3 Comments

As a use note (to OP): A single level dictionary facilitates faster access than a nested one requiring two separate lookups.
Agreed! But OP did ask for 2-D. I view it as, this is what OP asked for. Your's is what OP needs.
Thanks, let me try this out. Looks like you both agree , the other solution is better/faster.
3

IIUC, You could set a multi-index using df.set_index, followed by a df.groupby call. Then, build your dictionary inside a dict comprehension:

dict_ = {i : g for i, g in df.set_index(['Year', 'Quarter']).groupby(level=[0, 1])}

for k in dict_:
    print(dict_[k])

             Client FYoQ  Revenue
Year Quarter                     
2016 1.0          B    Q      120


             Client FYoQ  Revenue
Year Quarter                     
2015 3.0          C    Q       50


             Client FYoQ  Revenue
Year Quarter                     
2017 4.0          D    Q       40

The keys are (year, quarter) tuples, which are very manageable.


To save to a CSV file, the last loop will need a .to_csv call:

for k in dict_:
    label = 'data{}Q{}'.format(map(str, k))
    dict_[k].to_csv(label)

4 Comments

Thanks, let me try this out.
Thanks this does work. I was wondering how do i modify the last for loop so that I can write pd.to_csv the various resulting files with the files getting names automatically like "data2015Q1.csv", "data2015Q2.csv", ...., "data2016Q4.csv"...
@AlhpaDelta Edited. You'll need .to_csv.
It gives an error "IndexError:tuple index out of range". It worked beautifully so far (thank you!). If I could get it to write back, that would be perfect, the actual data is several years and 4 quarters in each.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.