Why does matplotlib extrapolate/plot missing values?

Question

I have a situation where sometimes, a whole series of data is not available. I'm real-time plotting values from sensors, and these can be turned on and off via user interaction, and thus I cannot be sure the values are always in a series. A user can start a sensor and later turn it off and on again, but In this case, matplotlib draws a line from the last end point and the new start point.

The data I plotted was as follows:

[[  5.          22.57011604]
 [  6.          22.57408142]
 [  7.          22.56350136]
 [  8.          22.56394005]
 [  9.          22.56790352]
 [ 10.          22.56451225]
 [ 11.          22.56481743]
 [ 12.          22.55789757]
  #Missing x vals. Still plots straight line..
 [ 29.          22.55654716]
 [ 29.          22.56066513]
 [ 30.          22.56110382]
 [ 31.          22.55050468]
 [ 32.          22.56550789]
 [ 33.          22.56213379]
 [ 34.          22.5588932 ]
 [ 35.          22.54829407]
 [ 35.          22.56697655]
 [ 36.          22.56005478]
 [ 37.          22.5568161 ]
 [ 38.          22.54621696]
 [ 39.          22.55033493]
 [ 40.          22.55079269]
 [ 41.          22.55475616]
 [ 41.          22.54783821]
 [ 42.          22.55195618]]

my plot function looks a lot simplified like this:

def plot(self, data)
    for name, xy_dict in data.iteritems():
        x_vals = xy_dict['x_values']
        y_vals = xy_dict['y_values']
        line_to_plot = xy_dict['line_number']
        self.lines[line_to_plot].set_xdata(x_vals)
        self.lines[line_to_plot].set_ydata(y_vals)

Does anyone know why it does like that? And do I have to take care of non-serial x and y values when plotting? It seems matplotlib should take care of this on its own.. Otherwise i have to split lists into smaller lists and plot these?

Bart · Accepted Answer · 2016-07-04 08:43:15Z

4

One option would be to add dummy items wherever data is missing (in your case apparently when x changes by more than 1), and set them as masked elements. That way matplotlib skips the line segments. For example:

import numpy as np
import matplotlib.pylab as pl

# Your data, with some additional elements deleted...
data = np.array(
[[  5., 22.57011604],
 [  6., 22.57408142],
 [  9., 22.56790352],
 [ 10., 22.56451225],
 [ 11., 22.56481743],
 [ 12., 22.55789757],
 [ 29., 22.55654716],
 [ 33., 22.56213379],
 [ 34., 22.5588932 ],
 [ 35., 22.54829407],
 [ 40., 22.55079269],
 [ 41., 22.55475616],
 [ 41., 22.54783821],
 [ 42., 22.55195618]])

x = data[:,0]
y = data[:,1]

# Difference from element to element in x
dx = x[1:]-x[:-1]

# Wherever dx > 1, insert a dummy item equal to -1
x2 = np.insert(x, np.where(dx>1)[0]+1, -1)
y2 = np.insert(y, np.where(dx>1)[0]+1, -1)

# As discussed in the comments, another option is to use e.g.:
#x2 = np.insert(x, np.where(dx>1)[0]+1, np.nan)
#y2 = np.insert(y, np.where(dx>1)[0]+1, np.nan)
# and skip the masking step below.

# Mask elements which are -1
x2 = np.ma.masked_where(x2 == -1, x2)
y2 = np.ma.masked_where(y2 == -1, y2)

pl.figure()
pl.subplot(121)
pl.plot(x,y)
pl.subplot(122)
pl.plot(x2,y2)

edited Jul 4, 2016 at 8:43

answered Jul 4, 2016 at 8:09

Bart

10.4k5 gold badges54 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

enrm Over a year ago

Real nice! Thanks for the tip. I'm currently looking into using np.nan to also discontinue a line, which apparently works. I will try that first and yours later.

Bart Over a year ago

Masking seems to be relatively slow, for example on an array with random numbers, np.ma.masked_where(x<0, x) is about 5x slower as x[x<0]=np.nan for a large size of x.

Bart Over a year ago

I can't reproduce that, but given this: stackoverflow.com/questions/12708807/numpy-integer-nan answer, it seems that your x2 array is interpreted as an array with integers. You could try something like x2.astype(np.float64)

Bart Over a year ago

Yes, you need numpy arrays for my solution, so if you start of from Python lists, you need to do a cast first (where you can pass the dtype keyword, e.g. x=np.array(x, dtype=np.float64))

enrm Over a year ago

Ah, real nice. Tried it and it works like it should. Lots of thanks for your answers! <3

|

honza_p · Accepted Answer · 2016-07-04 08:39:36Z

3

Another option is to include None or numpy.nan as values for y.

This, for example, shows a disconnected line:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4,5],[5,6,None,7,8])

answered Jul 4, 2016 at 8:39

honza_p

2,1231 gold badge23 silver badges38 bronze badges

1 Comment

enrm Over a year ago

Found this out as well. Am currently trying to incorporate @Bart s answer to use np.nan, but getting some errors. Regards!

Alex Kamphuis · Accepted Answer · 2016-07-04 07:10:30Z

1

Matplotlib will connect all your consequetive datapoints with lines.

If you want to avoid this you could split your data at the missing x-values, and plot the two splitted lists separately.

answered Jul 4, 2016 at 7:10

Alex Kamphuis

684 bronze badges

1 Comment

enrm Over a year ago

Can I still set the x and y-data as before or do I need to plot them on new lines? (That would not be very good, since they would change colors etc). I would prefer if the data points were on the same matplotlib line

Collectives™ on Stack Overflow

Why does matplotlib extrapolate/plot missing values?

3 Answers 3

10 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related