1

I have an array that has different values, some of which are duplicates. How can I draw a histogram for them whose horizontal axis is the name of the element and the vertical axis is the number in the array?

arr= ['a','a','a','b','c','b']
1
  • 1
    Do you want ASCII art, or do you want graphical output? Do you know how to produce the counts? Commented Sep 21, 2021 at 20:47

4 Answers 4

1

Note that matplotlib's hist does not play nicely with string data (see the bar/tick positions):

import matplotlib.pyplot as plt

plt.hist(arr)

It's certainly possible to fix this manually, but it's easier to use pandas or seaborn. Both use matplotlib under the hood, but they provide better default formatting.

Also:

  • If there are too many bars to fit comfortably in the default frame, you can widen the figsize. In these examples I've set figsize=(6, 3).
  • If you want to rotate the x ticks, add plt.xticks(rotation=90).

pandas

  • pandas value_counts and plot.bar

    import pandas as pd
    
    pd.value_counts(arr).plot.bar(figsize=(6, 3))
    # pd.Series(arr).value_counts().plot.bar(figsize=(6, 3))
    


seaborn

  • seaborn histplot

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(figsize=(6, 3))
    sns.histplot(arr, ax=ax)
    

  • seaborn countplot

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(figsize=(6, 3))
    sns.countplot(arr, ax=ax)
    


matplotlib

  • collections.Counter with matplotlib bar

    from collections import Counter
    
    counts = Counter(arr)
    fig, ax = plt.subplots(figsize=(6, 3))
    ax.bar(counts.keys(), counts.values())
    

  • numpy unique with matplotlib bar

    import numpy as np
    
    uniques, counts = np.unique(arr, return_counts=True)
    fig, ax = plt.subplots(figsize=(6, 3))
    ax.bar(uniques, counts)
    

Sign up to request clarification or add additional context in comments.

Comments

1

You can use the matplotlib library to plot a histogram directly from a list. The code for it goes as follows:

from matplotlib import pyplot as plt

arr= ['a','a','a','b','c','b']

plt.hist(arr)
plt.show()

You can check out more about the histogram function from matplotlib out here: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

You can do other stuff like setting the color for your histogram plot, changing the alignment, and many other things.

2 Comments

Thanks, but due to the large number of arrays of writings using this library are not very readable. Is there a way to solve this problem?
From what you have mentioned, I believe that you have big data or a distributed environment. If that is the case, you can use PySpark to analyze your data over a distributed environment and get a count of unique elements and plot a histogram. But if it is just arrays (called lists in Python), you can use multi-processing, get the unique elements in the array by converting it into a set using set(arr), create a dictionary with those elements as the keys and default values as 0, iterate over the lists, update the count, and plot a histogram from that dictionary using matplotlib
0

Histograms are used for numerical data. Your strings data (categorical) it would be better to use a bar chart. Here's the code to generate a bar chart with matplotlib:

import matplotlib.pyplot as plt
from collections import Counter
arr= ['a', 'a', 'a', 'b', 'c', 'b']
data = Counter(arr)
plt.bar(data.keys(), data.values())
plt.show()

barchart

Even though histogram and bar chart in this case look similar, with an histogram you could have an unexpected result for instance if requiring a certain number of bins.

Comments

0

There are multiple steps to this problem..

Step 1: You need to collect the data in a convenient location. Going by your example, a good option would be to make a list with values. This could use a .count() to accomplish that. Other methods are possible, of course.

Step 2: To display the data, you could use a lib like matplotlib.pyplot. This may also take care of step 1. But that is not importent.

2 Comments

Thanks, but due to the large number of arrays of writings using this library are not very readable. Is there a way to solve this problem?
But if you first reduce it by collecting the info like in step one, is that then still too much to work with in step 2?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.