0

I know how to plot a histogram when individual datapoints are given like: (33, 45, 54, 33, 21, 29, 15, ...)

by simply using something matplotlib.pyplot.hist(x, bins=10)

but what if I only have grouped data like:

| Marks    |Number of students |
| -------- | ----------------- |
| 0-10    | 8               |
| 10-20  | 12           |
|  20-30       |    24         |
|  30-40       |    26         |
|  ......       | ......            | and so on.

I know that I can use bar plots to mimic a histogram by changing xticks but what if I want to do this by using only hist function of matplotlib.pyplot?

Is it possible to do this?

1
  • The table formatting was not working properly so I used an image instead Commented Apr 10, 2021 at 8:44

2 Answers 2

2

You can build the hist() params manually and use the existing value counts as weights.

Say you have this df:

>>> df = pd.DataFrame({'Marks': ['0-10', '10-20', '20-30', '30-40'], 'Number of students': [8, 12, 24, 26]})
   Marks  Number of students
0   0-10                   8
1  10-20                  12
2  20-30                  24
3  30-40                  26

The bins are all the unique boundary values in Marks:

>>> bins = pd.unique(df.Marks.str.split('-', expand=True).astype(int).values.ravel())
array([ 0, 10, 20, 30, 40])

Choose one x value per bin, e.g. the left edge to make it easy:

>>> x = bins[:-1]
array([ 0, 10, 20, 30])

Use the existing value counts (Number of students) as weights:

>>> weights = df['Number of students'].values
array([ 8, 12, 24, 26])

Then plug these into hist():

>>> plt.hist(x=x, bins=bins, weights=weights)

reconstructed histogram

Sign up to request clarification or add additional context in comments.

1 Comment

An interesting feature to display in a histogram is the class interval boundaries. Much more information is added by displaying them using matplotlib
0

One possibility is to “ungroup” data yourself.

For example, for the 8 students with a mark between 0 and 10, you can generate 8 data points of value of 5 (the mean). For the 12 with a mark between 10 and 20, you can generate 12 data points of value 15.

However, the “ungrouped” data will only be an approximation of the real data. Thus, it is probably better to just use a matplotlib.pyplot.bar plot.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.