3

What about pandas series is causing this?

plt.plot(df["Column"].as_matrix())

good plot


plt.plot(df["Column"])

bad plot


df["Column"].plot()

actually has similar artifacts, but isn't quite the same plot.

1
  • Might help if you included sample data. I notice that the x-axis range changes between the two methods. I'm thinking that the index of the 'pandas.Series' might be causing the different behaviour Commented Mar 9, 2017 at 0:42

1 Answer 1

2

Let's say you have the following DataFrame

x = [2,1,3,6,5,6,7]
y = [1,2,5,1,1,6,1]
df = pd.DataFrame({"y" : y }, index=x)

Then calling
plt.plot(df["y"].as_matrix()) is equivalent to plt.plot(y) which plots only the y values against it's own index (starting at 0, incrementing by 1).
In contrast,
plt.plot(df["y"]) is equivalent to plt.plot(x,y) which plots the y values against the index of the dataframe. If those indices are not sorted, the plot will look distorted. (The same is true for the pandas plot command.)

Here is a complete example.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"y" : [1,2,5,1,1,6,1] }, index=[2,1,3,6,5,6,7])

plt.plot(df["y"].as_matrix(), lw=3, label='plt.plot(df["y"].as_matrix())')
plt.plot(df["y"], lw=3, label='plt.plot(df["y"])')
df["y"].plot(ax=plt.gca(), linestyle="--", color="k", label='df["y"].plot()')

plt.legend()
plt.show()

enter image description here

The easiest solution to be able to use any of the above methods is to reindex the dataframe

df = df.reset_index()
Sign up to request clarification or add additional context in comments.

3 Comments

This is more or less what had happened. I had been sub-sampling my data and sorting on another column, and this jacked up my index. I had assumed that the "natural" index on the series would be like array indexing, but apparently not. Out of curiosity, what is the reasoning behind doing the indexing in this way, as opposed to just being [0,series.size)...?
The reason is of course that each entry should have its defined index (otherwise there would be no reason for an index). This allows e.g. to identify the data unambiguously after sorting or filtering. You can use df = df.reset_index() to newly index the dataframe.
...Like in a relational database.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.