How to plot large range values with matplotlib?

Question

I have to run soak tests for longer duration and capture 3 datasets (before the run, in-between the run, after the run), plot them and manually analyze the plots.

All the datasets span across the very large range (0-10^5). So, when I am plotting this data using matplotlib's bar function, the bar for smaller values is too small to be analyzed.

import matplotlib
matplotlib.use('Agg')

import sys,os,argparse,json,string,numpy
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

bx = ('smmpg_b1024k', 'smmpg_b10k', 'smmpg_b11k', 'smmpg_b128', 'smmpg_b128k', 'smmpg_b12k', 'smmpg_b13k', 'smmpg_b14k', 'smmpg_b15k', 'smmpg_b160', 'smmpg_b16k', 'smmpg_b17k', 'smmpg_b18k', 'smmpg_b192', 'smmpg_b192k', 'smmpg_b19k', 'smmpg_b1k', 'smmpg_b20k', 'smmpg_b21k', 'smmpg_b224', 'smmpg_b22k', 'smmpg_b23k', 'smmpg_b24k', 'smmpg_b256', 'smmpg_b256k', 'smmpg_b25k', 'smmpg_b26k', 'smmpg_b27k', 'smmpg_b288', 'smmpg_b28k', 'smmpg_b29k', 'smmpg_b2k', 'smmpg_b30k', 'smmpg_b31k', 'smmpg_b32', 'smmpg_b320', 'smmpg_b320k', 'smmpg_b32k', 'smmpg_b33k', 'smmpg_b34k', 'smmpg_b352', 'smmpg_b35k', 'smmpg_b36k', 'smmpg_b37k', 'smmpg_b384', 'smmpg_b384k', 'smmpg_b38k', 'smmpg_b39k', 'smmpg_b3k', 'smmpg_b40k', 'smmpg_b416', 'smmpg_b41k', 'smmpg_b42k', 'smmpg_b43k', 'smmpg_b448', 'smmpg_b448k', 'smmpg_b44k', 'smmpg_b45k', 'smmpg_b46k', 'smmpg_b47k', 'smmpg_b480', 'smmpg_b48k', 'smmpg_b49k', 'smmpg_b4k', 'smmpg_b50k', 'smmpg_b512', 'smmpg_b512k', 'smmpg_b51k', 'smmpg_b52k', 'smmpg_b53k', 'smmpg_b544', 'smmpg_b54k', 'smmpg_b55k', 'smmpg_b56k', 'smmpg_b576', 'smmpg_b576k', 'smmpg_b57k', 'smmpg_b58k', 'smmpg_b59k', 'smmpg_b5k', 'smmpg_b608', 'smmpg_b60k', 'smmpg_b61k', 'smmpg_b62k', 'smmpg_b63k', 'smmpg_b64', 'smmpg_b640', 'smmpg_b640k', 'smmpg_b64k', 'smmpg_b672', 'smmpg_b6k', 'smmpg_b704', 'smmpg_b704k', 'smmpg_b736', 'smmpg_b768', 'smmpg_b768k', 'smmpg_b7k', 'smmpg_b800', 'smmpg_b832', 'smmpg_b832k', 'smmpg_b864', 'smmpg_b896', 'smmpg_b896k', 'smmpg_b8k', 'smmpg_b928', 'smmpg_b96', 'smmpg_b960', 'smmpg_b960k', 'smmpg_b992', 'smmpg_b9k', 'smmpg_ccb', 'smmpg_msb', 'smmpg_twomb', 'total-pages', 'total-size')

before = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)

intermediate = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)

after = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)

x_locations= numpy.arange(len(bx))
width=0.27
fig = plt.figure(figsize=(50, 20))
ax = fig.add_subplot(111)

before_test_mempools_bar = ax.bar(x_locations, list(before), width, color='r')
intermediate_test_mempools_bar = ax.bar(x_locations + width, list(intermediate), width, color='g')
after_test_mempools_bar = ax.bar(x_locations + width *2,list(after), width, color='b')
ax.set_ylabel('Memory')

ax.set_xticks(x_locations + width)
ax.set_xticklabels(bx,rotation=90)
ax.legend((before_test_mempools_bar[0],intermediate_test_mempools_bar[0],after_test_mempools_bar[0]),('BEFORE','INTERMEDIATE','AFTER'))

fig.savefig("plot.png")
plt.close()

The above code produces the following plot:

Goal: My goal is to accommodate all the data in the plot that is visually nice and so the plot can be analyzed by any tester in the team. Currently, it's hard to see what's happened with a smaller range of values.

One possible approach would be normalization but not sure if the data would be retained original. Any possible solutions are appreciated.

Use a logarithmic y-axis, i.e. instead of plot() use semilogy(): matplotlib.org/api/_as_gen/matplotlib.pyplot.semilogy.html. You can change the base depending on what the dynamic range you need to display is. — alkasm
– alkasm, Commented Apr 20, 2019 at 20:33

Alec · Accepted Answer · 2019-04-21 06:29:11Z

1

Transcribing @Alexander Reynold's comment into an answer:

Use a logarithmic y-axis, i.e. instead of plot() use semilogy() – You can change the base depending on what the dynamic range you need to display is.

edited Apr 21, 2019 at 6:29

answered Apr 20, 2019 at 20:42

Alec

9,7338 gold badges44 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

godot Over a year ago

Why don't you first propose Alexander Reynold to turn his comment into an answer before doing it?

alkasm Over a year ago

The above comment is good to keep in mind generally, but I'm okay with you putting it into an answer.

CHINTAN VADGAMA · Accepted Answer · 2019-04-23 02:27:25Z

1

I didn't know that there is already an argument parameter in bar function to change the scale of Y-axis.

After adding log=True argument to all the bar functions as below,

before_test_mempools_bar = ax.bar(x_locations, list(before_test_mempools), width, color='r',log=True)
intermediate_test_mempools_bar = ax.bar(x_locations + width, list(intermediate_test_mempools), width, color='g',log=True)
after_test_mempools_bar = ax.bar(x_locations + width *2,list(after_test_mempools), width, color='b',log=True)

My plot looks much nicer now and easy to analyze.

answered Apr 23, 2019 at 2:27

CHINTAN VADGAMA

7247 silver badges13 bronze badges

Comments

godot · Accepted Answer · 2019-04-20 21:50:43Z

If I may, I think your problem is not technical but that you didn't think enough about you want you to show and what you want the people to look at because the graphic you're showing doesn't seem to have a lot of "noise" - i.e. area of the graphics that don't give much or even any information.

So, even if you only provided simulated data, it seems that there is some room of improvement to make a much readable and "to the point" visualization.

For example you could:

remove uninteresting information (maybe those at 0.0 or those that haven't evolved ?)
regroup some categories by group (what about creating new aggregated categories ? or showing the data in a total different way with values on the x axes and names of categories on the y axes ?)
Also, maybe you're putting together different kind of things (those last 3 bx categories ('smmpg_twomb', 'total-pages' &'total-size') shouldn't they be put in a graph on their own ?)
Use a data structure like pandas' DataFrame to better handle and clean your data in order to do all of the three previous suggestions.

It's just a few suggestions but maybe it will help.

Here is an exemple of what you could do... Just to illustrate:

import matplotlib
matplotlib.use('Agg')
import sys,os,argparse,json,string,numpy
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
bx = ('smmpg_b1024k', 'smmpg_b10k', 'smmpg_b11k', 'smmpg_b128', 'smmpg_b128k', 'smmpg_b12k', 'smmpg_b13k',
      'smmpg_b14k', 'smmpg_b15k', 'smmpg_b160', 'smmpg_b16k', 'smmpg_b17k', 'smmpg_b18k', 'smmpg_b192',
      'smmpg_b192k', 'smmpg_b19k', 'smmpg_b1k', 'smmpg_b20k', 'smmpg_b21k', 'smmpg_b224', 'smmpg_b22k',
      'smmpg_b23k', 'smmpg_b24k', 'smmpg_b256', 'smmpg_b256k', 'smmpg_b25k', 'smmpg_b26k', 'smmpg_b27k',
      'smmpg_b288', 'smmpg_b28k', 'smmpg_b29k', 'smmpg_b2k', 'smmpg_b30k', 'smmpg_b31k', 'smmpg_b32',
      'smmpg_b320', 'smmpg_b320k', 'smmpg_b32k', 'smmpg_b33k', 'smmpg_b34k', 'smmpg_b352', 'smmpg_b35k',
      'smmpg_b36k', 'smmpg_b37k', 'smmpg_b384', 'smmpg_b384k', 'smmpg_b38k', 'smmpg_b39k', 'smmpg_b3k',
      'smmpg_b40k', 'smmpg_b416', 'smmpg_b41k', 'smmpg_b42k', 'smmpg_b43k', 'smmpg_b448', 'smmpg_b448k',
      'smmpg_b44k', 'smmpg_b45k', 'smmpg_b46k', 'smmpg_b47k', 'smmpg_b480', 'smmpg_b48k', 'smmpg_b49k',
      'smmpg_b4k', 'smmpg_b50k', 'smmpg_b512', 'smmpg_b512k', 'smmpg_b51k', 'smmpg_b52k', 'smmpg_b53k',
      'smmpg_b544', 'smmpg_b54k', 'smmpg_b55k', 'smmpg_b56k', 'smmpg_b576', 'smmpg_b576k', 'smmpg_b57k',
      'smmpg_b58k', 'smmpg_b59k', 'smmpg_b5k', 'smmpg_b608', 'smmpg_b60k', 'smmpg_b61k', 'smmpg_b62k',
      'smmpg_b63k', 'smmpg_b64', 'smmpg_b640', 'smmpg_b640k', 'smmpg_b64k', 'smmpg_b672', 'smmpg_b6k',
      'smmpg_b704', 'smmpg_b704k', 'smmpg_b736', 'smmpg_b768', 'smmpg_b768k', 'smmpg_b7k', 'smmpg_b800',
      'smmpg_b832', 'smmpg_b832k', 'smmpg_b864', 'smmpg_b896', 'smmpg_b896k', 'smmpg_b8k', 'smmpg_b928',
      'smmpg_b96', 'smmpg_b960', 'smmpg_b960k', 'smmpg_b992', 'smmpg_b9k', 'smmpg_ccb', 'smmpg_msb',
      'smmpg_twomb', 'total-pages', 'total-size')

before = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)
intermediate = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)
after = (0.0, 2.0, 2.0, 4.0, 8.0, 2.0, 2.0, 2.0, 2.0, 6.0, 2.0, 4.0, 44.0, 76.0, 6.0, 2.0, 2.0, 2.0, 18.0, 2.0, 18.0, 30.0, 32.0, 2.0, 12.0, 2.0, 170.0, 0.0, 4.0, 2.0, 0.0, 24.0, 0.0, 2.0, 10.0, 2.0, 12.0, 2.0, 36.0, 0.0, 2.0, 0.0, 0.0, 0.0, 12.0, 22.0, 2.0, 0.0, 272.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 8.0, 2.0, 0.0, 2.0, 2.0, 6.0, 0.0, 0.0, 0.0, 34.0, 2.0, 0.0, 2.0, 0.0, 2.0, 92.0, 2.0, 0.0, 2.0, 2.0, 40.0, 2.0, 0.0, 2.0, 2.0, 0.0, 14.0, 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 18.0, 2.0, 28.0, 4.0, 0.0, 2.0, 2.0, 6.0, 214.0, 26226.0, 13813.0, 27626.0)

# Put your data in a DataFrame:
df = pd.DataFrame({'before': before,
     'intermediate': intermediate,
     'after': after, 'bx': bx,
     'x_locations':  numpy.arange(len(bx))
})

#filter columns - you can put them in another graph!
df_filt_cat = df.loc[(df.bx != 'smmpg_twomb') & (df.bx != 'total-pages') & (df.bx != 'total-size')]

# filter categories that stay 0 all the way
df_filt_zero = df_filt_cat.loc[(df_filt_cat.before != 0) & (df_filt_cat.intermediate != 0) & (df_filt_cat.after != 0)]

x_locations= numpy.arange(len(bx))
width=0.27
fig = plt.figure(figsize=(50, 20))
ax = fig.add_subplot(111)

before_test_mempools_bar = ax.bar(df_filt_zero.x_locations, df_filt_zero.before, width, color='r')
before_test_mempools_bar = ax.bar(df_filt_zero.x_locations, df_filt_zero.before, width, color='r')
intermediate_test_mempools_bar = ax.bar(df_filt_zero.x_locations + width, df_filt_zero.intermediate, width, color='g')
after_test_mempools_bar = ax.bar(df_filt_zero.x_locations + width *2, df_filt_zero.after, width, color='b')

ax.set_ylabel('Memory')
ax.set_xticks(x_locations + width)
ax.set_xticklabels(bx,rotation=90)
ax.legend((before_test_mempools_bar[0],intermediate_test_mempools_bar[0],after_test_mempools_bar[0]),('BEFORE','INTERMEDIATE','AFTER'))

# just to show the result I commented this line
#fig.savefig("plot.png")
# and put this one instead:
plt.show()

It obviously still needs improvement but it's already a bit more readable.

Collectives™ on Stack Overflow

How to plot large range values with matplotlib?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related