2

Currently I have a matrix of 1's, 0's, and -1's where each row is a person and each column is a bill that they voted on. The 1's, 0's, and -1's in each cell denote how they voted.

The histogram I am trying to build would show the number of people with x amount of yes votes (the number of rows with x amount of 1's) on the Y axis. On the X axis it would have ticks 0-N yes votes. So for example, if 30 people voted yes, the bar at the 30 label on the X axis would go up to 30 on the Y axis.

Here is a screenshot of these histograms that I quickly made in MatLab(where my experience which such things is): histograms built in MatLab

My question is how to easily and effectively do this in Python. I have very little experience with Python.

The code I have:

def buildHistogram(matrix):
    plt.hist(matrix, bins = 30)
    plt.show()

Which yields: histograms built in Python

Please let me know how I can split these into three different histograms. Do I need to make three different arrays?

4
  • Try using pandas itself for filtering the data and then using its hist built-in: df[df.desired_column == 1].hist(bins = 30), for the Yes votes of a desired_column Commented Jul 12, 2017 at 23:34
  • Do you mean the file from which the data is pulled? It is a long text file of -1's, 1's, and 0's. @MSeifert Commented Jul 12, 2017 at 23:35
  • @ViníciusAguiar do you know if I am able to include a list of columns? I would like to see all the columns past the first 10 at once. Commented Jul 12, 2017 at 23:37
  • hmm I'm not sure how to do that, maybe @MSeifert knows a good way! =) Commented Jul 12, 2017 at 23:41

1 Answer 1

2

I used some random data set to reproduce it:

import numpy as np
import matplotlib.pyplot as plt
arr = np.random.randint(-1, 2, (200, 100))

Then it's just (neglecting axis labels and titles):

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
ax1.hist(np.sum(arr==-1, axis=1), bins=30)  # no
ax2.hist(np.sum(arr==0, axis=1), bins=30)   # nothing
ax3.hist(np.sum(arr==1, axis=1), bins=30)   # yes

Which gives me (which should be roughly what you want):

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Weird how the "random" data generated such a stripey graphs.
@DanielF I don't think these "empty stripes" are real. That's a problem in my dataset where the range of values can be less than the number of bins. For example in the second data set the range is ~23 - ~44 so it has 21 "filled bins" and 9 "empty bins"...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.