0

Explanation:

I have two numpy arrays: dataX and dataY, and I am trying to filter each array to reduce the noise. The image shown below shows the actual input data (blue dots) and an example of what I want it to be like(red dots). I do not need the filtered data to be as perfect as in the example but I do want it to be as straight as possible. I have provided sample data in the code. enter image description here

What I have tried:

Firstly, you can see that the data isn't 'continuous', so I first divided them into individual 'segments' ( 4 of them in this example), and then applied a filter to each 'segment'. Someone suggested that I use a Savitzky-Golay filter. The full, run-able code is below:

import scipy as sc
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt

# Sample Data
ydata = np.array([1,0,1,2,1,2,1,0,1,1,2,2,0,0,1,0,1,0,1,2,7,6,8,6,8,6,6,8,6,6,8,6,6,7,6,5,5,6,6, 10,11,12,13,12,11,10,10,11,10,12,11,10,10,10,10,12,12,10,10,17,16,15,17,16, 17,16,18,19,18,17,16,16,16,16,16,15,16])
xdata = np.array([1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32,33, 1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32])

# Used a diff array to find where there is a big change in Y. 
# If there's a big change in Y, then there must be a change of 'segment'.
diffy = np.diff(ydata)

# Create empty numpy arrays to append values into
filteredX = np.array([])
filteredY = np.array([])

# Chose 3 to be the value indicating the change in Y
index = np.where(diffy >3)

# Loop through the array
start = 0
for i in range (0, (index[0].size +1) ):
# Check if last segment is reached
    if i == index[0].size:
        print xdata[start:]
        partSize = xdata[start:].size
# Window length must be an odd integer
        if partSize % 2 == 0:
            partSize = partSize - 1

        filteredDataX = sc.signal.savgol_filter(xdata[start:], partSize, 3)
        filteredDataY = sc.signal.savgol_filter(ydata[start:], partSize, 3)
        filteredX = np.append(filteredX, filteredDataX)
        filteredY = np.append(filteredY, filteredDataY)

    else:
        print xdata[start:index[0][i]]
        partSize = xdata[start:index[0][i]].size
        if partSize % 2 == 0:
            partSize = partSize - 1
        filteredDataX = sc.signal.savgol_filter(xdata[start:index[0][i]], partSize, 3)
        filteredDataY = sc.signal.savgol_filter(ydata[start:index[0][i]], partSize, 3)
        start = index[0][i] 
        filteredX = np.append(filteredX, filteredDataX)
        filteredY = np.append(filteredY, filteredDataY)

# Plots
plt.plot(xdata,ydata, 'bo', label = 'Input Data')
plt.plot(filteredX, filteredY, 'ro', label = 'Filtered Data')

plt.xlabel('X')
plt.ylabel('Y')
plt.title('Result')
plt.legend()
plt.show()

This is my result: enter image description here When each point is connected, the result looks as follows. enter image description here I have played around with the order, but it seems like a third order gave the best result.

I have also tried these filters, among a few others:

But so far none of the filters I have tried were close to what I really wanted. What is the best way to filter data such as this? Looking forward to your help.

1 Answer 1

3

One way to get something looking close to your ideal would be clustering + linear regression.

Note that you have to provide the number of clusters and I also cheated a bit in scaling up y before clustering.enter image description here

import numpy as np
from scipy import cluster, stats

ydata = np.array([1,0,1,2,1,2,1,0,1,1,2,2,0,0,1,0,1,0,1,2,7,6,8,6,8,6,6,8,6,6,8,6,6,7,6,5,5,6,6, 10,11,12,13,12,11,10,10,11,10,12,11,10,10,10,10,12,12,10,10,17,16,15,17,16, 17,16,18,19,18,17,16,16,16,16,16,15,16])
xdata = np.array([1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32,33, 1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32])

def split_to_lines(x, y, k):
    yo = np.empty_like(y, dtype=float)
    # get the cluster centers and the labels for each point
    centers, map_ = cluster.vq.kmeans2(np.array((x, y * 2)).T.astype(float), k)
    # for each cluster, use the labels to select the points belonging to
    # the cluster and do a linear regression
    for i in range(k):
        slope, interc, *_ = stats.linregress(x[map_==i], y[map_==i])
        # use the regression parameters to construct y values on the
        # best fit line
        yo[map_==i] = x[map_==i] * slope + interc
    return yo

import pylab
pylab.plot(xdata, ydata, 'or')
pylab.plot(xdata, split_to_lines(xdata, ydata, 4), 'ob')
pylab.show()
Sign up to request clarification or add additional context in comments.

4 Comments

Hi Paul. Thank you for taking the time to answer :) I just have a few questions: Do you always need to scale up Y or is it only specific to this case? If so, how do I know how much to scale it by ? I am currently reading up on clustering, but it seems a bit complicated. Do you mind adding on comments to your code to explain briefly what each line in the "split_to_lines" function does?
@user3451660 clustering is as far as I know---and I am no expert---not an exact science. But I think it works best when the clusters are kind of roundish. Now, in your data, the clusters are elongated in x direction, so I stretched y as a quick fix. Sorry I can't be more specific. I'll see what I can do about the commenting
Ok, thank you! So technically only the Ydata is being filtered ? If I had to connect each point, (see additional image added) my original shows X increasing until there is a big decrease in X, whereas yours is slightly different. Also, with your method, I would also need to calculate how many lines there are to split (there were 4 in the example but could be many more)?
@user3451660 Yes and yes. But you can simply sort the points within each cluster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.