3

I am trying to find clusters (i.e. groups within an array where the difference between [n+1] and [n] is less than a certain value) inside an array. I have a numpy array that is a sequence of time stamps. I can find the difference between time stamps using numpy.diff(), but I have a hard time trying to determine clusters without looping through the array. To exemplify this:

t = t = np.array([ 147, 5729, 5794, 5806, 6798, 8756, 8772, 8776, 9976])
dt  = np.diff(t)
dt = array([5582,   65,   12,  992, 1958,   16,    4, 1200])

If my cluster condition is dt < 100 t[1], t[2], and t[3] would be one cluster and t[5], t[6], and t[7] would be another. I have tried playing around with numpy.where(), but I am having no success with getting the conditions tuned right to separate out the clusters, i.e.

cluster1 = np.array([5729, 5794, 5806])
cluster2 = np.array([8756, 8772, 8776])

or something along the lines.

Any help is appreciated.

1 Answer 1

7
import numpy as np

t = np.array([ 147, 5729, 5794, 5806, 6798, 8756, 8772, 8776, 9976])
dt  = np.diff(t)
pos = np.where(dt > 100)[0] + 1
print np.split(t, pos)

the output is:

[array([147]), 
array([5729, 5794, 5806]), 
array([6798]), 
array([8756, 8772, 8776]), 
array([9976])]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.