0

I have the following df, weekly spend in a number of shops:

             shop1    shop2    shop3    shop4    shop5    shop6    shop7  \
date_week                                                                  
2          4328.85  5058.17  3028.68  2513.28  4204.10  1898.26  2209.75   
3          5472.00  5085.59  3874.51  1951.60  2984.71  1416.40  1199.42   
4          4665.53  4264.05  2781.70  2958.25  4593.46  2365.88  2079.73   
5          5769.36  3460.79  3072.47  1866.19  3803.12  2166.84  1716.71   
6          6267.00  4033.58  4053.70  2215.04  3991.31  2382.02  1974.92   
7          5436.83  4402.83  3225.98  1761.87  4202.22  2430.71  3091.33   
8          4850.43  4900.68  3176.00  3280.95  3483.53  4115.09  2594.01   
9          6782.88  3800.03  3865.65  2221.43  4116.28  2638.28  2321.55   
10         6248.18  4096.60  5186.52  3224.96  3614.24  2541.00  2708.36   
11         4505.18  2889.33  2937.74  2418.34  5565.57  1570.55  1371.54   
12         3115.26  1216.82  1759.49  2559.81  1403.61  1550.77   478.34   
13         4561.82   827.16  4661.51  3197.90  1515.63  1688.57   247.25   

             shop8    shop9  
date_week                    
2          3578.81  3134.39  
3          4625.10  2676.20  
4          3417.16  3870.00  
5          3980.78  3439.60  
6          3899.42  4192.41  
7          4190.60  3989.00  
8          4786.40  3484.51  
9          6433.02  3474.66  
10         4414.19  3809.20  
11         3590.10  3414.50  
12         4297.57  2094.00  
13         3963.27   871.25  

If I plot these in a line plot or "spaghetti plot" It works fine.

The goal is the look at trend in weekly sales over the last three months in 9 stores.

But looks a bit messy:

newgraph.plot()

enter image description here

I had a look at similar questions such as this one which uses df.interpolate() but it looks like I need to have missing values in there first. this answer seems to require a time series.

Is there another method to smoothen out the lines?

It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine. All I am interested in is the trend over the last number of weeks. I have also tried logy=True in the plot() method to calm the lines a bit, but it didn't help.

My df, for pd.DataFrame.fromt_dict():

{'shop1': {2: 4328.849999999999,
  3: 5472.0,
  4: 4665.530000000001,
  5: 5769.36,
  6: 6267.0,
  7: 5436.83,
  8: 4850.43,
  9: 6782.879999999999,
  10: 6248.18,
  11: 4505.18,
  12: 3115.26,
  13: 4561.82},
 'shop2': {2: 5058.169999999993,
  3: 5085.589999999996,
  4: 4264.049999999997,
  5: 3460.7899999999977,
  6: 4033.579999999998,
  7: 4402.829999999999,
  8: 4900.679999999997,
  9: 3800.0299999999997,
  10: 4096.5999999999985,
  11: 2889.3300000000004,
  12: 1216.8200000000002,
  13: 827.16},
 'shop3': {2: 3028.679999999997,
  3: 3874.5099999999984,
  4: 2781.6999999999994,
  5: 3072.4699999999984,
  6: 4053.6999999999966,
  7: 3225.9799999999987,
  8: 3175.9999999999973,
  9: 3865.6499999999974,
  10: 5186.519999999996,
  11: 2937.74,
  12: 1759.49,
  13: 4661.509999999998},
 'shop4': {2: 2513.2799999999997,
  3: 1951.6000000000001,
  4: 2958.25,
  5: 1866.1900000000003,
  6: 2215.04,
  7: 1761.8700000000001,
  8: 3280.9499999999994,
  9: 2221.43,
  10: 3224.9600000000005,
  11: 2418.3399999999997,
  12: 2559.8099999999995,
  13: 3197.9},
 'shop5': {2: 4204.0999999999985,
  3: 2984.71,
  4: 4593.459999999999,
  5: 3803.12,
  6: 3991.31,
  7: 4202.219999999999,
  8: 3483.529999999999,
  9: 4116.279999999999,
  10: 3614.24,
  11: 5565.569999999997,
  12: 1403.6100000000001,
  13: 1515.63},
 'shop6': {2: 1898.260000000001,
  3: 1416.4000000000005,
  4: 2365.8799999999997,
  5: 2166.84,
  6: 2382.019999999999,
  7: 2430.71,
  8: 4115.0899999999965,
  9: 2638.2800000000007,
  10: 2541.0,
  11: 1570.5500000000004,
  12: 1550.7700000000002,
  13: 1688.5700000000004},
 'shop7': {2: 2209.75,
  3: 1199.42,
  4: 2079.7300000000005,
  5: 1716.7100000000005,
  6: 1974.9200000000005,
  7: 3091.329999999999,
  8: 2594.0099999999993,
  9: 2321.5499999999997,
  10: 2708.3599999999983,
  11: 1371.5400000000004,
  12: 478.34,
  13: 247.25000000000003},
 'shop8': {2: 3578.8100000000004,
  3: 4625.1,
  4: 3417.1599999999994,
  5: 3980.7799999999997,
  6: 3899.4200000000005,
  7: 4190.600000000001,
  8: 4786.4,
  9: 6433.019999999998,
  10: 4414.1900000000005,
  11: 3590.1,
  12: 4297.57,
  13: 3963.27},
 'shop9': {2: 3134.3900000000003,
  3: 2676.2,
  4: 3870.0,
  5: 3439.6,
  6: 4192.41,
  7: 3989.0,
  8: 3484.51,
  9: 3474.66,
  10: 3809.2,
  11: 3414.5,
  12: 2094.0,
  13: 871.25}}
6
  • 2
    What are you trying to show with your plot? Defining you goals will make "It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine" a lot clearer Commented Apr 27, 2020 at 12:48
  • Smoothing data can always introduce artifacts which could suggest results you don't have. Especially if the lines are as close together as in your plot. You could try a different plot type. E.g., stacked histogram? Think about what you want to see in this plot. Is it one shop thats sticking out? Then draw a thicker line for this one and decrease the alpha values for the other lines. I think this plot, as it is, would overwhelm the reader (?). Commented Apr 27, 2020 at 12:53
  • The goal is the look at trend in weekly sales over the last three months in 9 stores . @FlyingTeller the reason I said "It doesn't matter if the values are not exactly accurate anymore, some interpolation is fine" is due to anticipating cmosig's comment saying this smoothing might suggest results I don't have. I just want to visualize the trend of all 9 shops at the same time, therefore exact values are not vital. Commented Apr 27, 2020 at 13:26
  • @cmosig I'm actually interested in all of these shops and how they are trending towards the end of the graph (as we approach Covid lockdown). Which ones have increased in weekly sales and which ones are suffering? Exact dollar amount is not important, just the trend. Shop #3 is increasing, #7 is negatively affected etc etc. If you have other ideas on how to viualize this in an easier to digest format I'd appreciate it. Commented Apr 27, 2020 at 13:34
  • @SCool how about displaying the difference between absolute shop sales in the respective week and average shop sales over the entire period on the y-axis? And then draw an axhline at 0. Maybe this will improve things. [example with bar plot] (qrb-bw.de/pdf_pool/pythondemo/cdpydoc/posnegbar.htm) Commented Apr 27, 2020 at 13:40

1 Answer 1

1

You could show the trend by plotting a regression line for the last few weeks, perhaps separately from the actual data, as the plot is already so crowded. I would use seaborn, because it has the convenient regplot() function:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

df.plot(figsize=[12, 10], style='--')
plt.xlim(2, 18)

last4 = df[len(df)-4:]

plt.gca().set_prop_cycle(None)
for shop in df.columns:
    sns.regplot(last4.index + 4, shop, data=last4, ci=None, scatter=False)
plt.ylabel(None)
plt.xticks(list(df.index)+[14, 17], labels=list(df.index)+[10, 13]);

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

That's much more pleasing to the eye, thank you. We are using the exact same data, but I'm getting the error: AttributeError: 'Int64Index' object has no attribute 'mean'. It seems to be caused by the regplot loop here: for shop in df.columns: sns.regplot(last4.index + 4, shop, data=last4, ci=None, scatter=False)., I think last4.index might be the problem. I tried changing it to a list but it didn't work very well. Any ideas?
@SCool Hmm, maybe a version issue? I'm using Python 3.7.7, pandas 1.0.3, matplotlib 3.1.3, and seaborn 0.10.0.
You could also try to pass an explicit list: sns.regplot([14, 15, 16, 17], shop, data=last4, ci=None, scatter=False)
I just updated seaborn and matplotlib. Works now. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.