2

This might seem a very naive thing but I just want to ensure if my understanding is correct.

To plot directly from pandas dataframe as a shortcut, my first option is to call the plot() method and pass in my x and y and the kind of the plot I want to make. Otherwise, I can assign the dataframe as a pandas.plotting._core.PlotAccessor object to plot and then call the appropriate method for plotting like bar(), box() etc.

So, syntactically I have

df.plot(x=x, y=y, kind='something') # call method OR
df.plot.something(x=x, y=y) # assign object and then call method

If my above claims are correct, then why I don't get what I intend for univariate plots (hist, box, etc)? Although it works perfectly fine for bivariate ones.

df = pd.DataFrame({'col1':[1,2,3,4], 'col2':[3,3,5,5], 'col3':[10,11,12,13]})
df.plot(x='col1', kind='hist') # or
df.plot.hist(x='col2')

gives a graph like

enter image description here

I understand that according to Pandas Documentation I should be using a Series by slicing the col2 for the same, but then what is the purpose of the x and y?

Also, this works as expected for bivariate plots like

df.plot.scatter(x='col1', y='col3')

enter image description here

What am I missing? Any help is appreciated. Thanks in advance.

1 Answer 1

3

From this resource, .plot plots the index against every column, and .plot() allows you to specify which columns to plot alone or against other columns.

The reason you weren't getting what you expected for the univariable histogram is because the parameter x for the DataFrame method pd.plot.hist() is not being used the way you expect.

To get the result I assume you want for a histogram of one variable, you should be using the parameter y (as y represents the count of each value of a DataFrame series).

df.plot.hist(y='col2')

enter image description here

When you set the parameter x='col2' what happens is that pandas traverses col2 of the DataFrame, and plots a histogram of the values in the other columns (basically treating col1 and col3 as y). That is why df.plot.hist(x='col2') gives you a combined histogram of the col1 and col3 values of your DataFrame.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Makes total sense. Thanks a ton. Can you also comment on my deductions of the plot and plot()
Glad my answer helped! I also added what I think is the best way to understand the difference between plot and plot()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.