1

I am trying to find the minimum image separation between two sets of particles. I have about 40 particles in each set, and their position vectors (three dimensional) are stored in two arrays of dimension (40, 3). I have to compute the Euclidean distance between each of the particles in one set, and each of the particles in the other, after applying the minimum image criterion. To make it clearer, a one-dimensional equivalent of the same would be, for two lists pos1 and pos2 with the coordinates, [func(i-j) for i in pos1 for j in pos2] where func = lambda x: x - np.rint(x/width)*width is a function which applies the minimum criterion.

In three dimensions, the Euclidean distance would be numpy.sqrt(dx**2 + dy**2 + dx**2) where dx, dy, and dz are returned by func for each dimension.

(The function func is just to demonstrate how minimum image criterion is applied. I do not use the same exact program structure.)

I am looking for an efficient way to do this, as I have to do the same operations as part of analysing multiple data sets, each with about 20000 time steps, and each time step containing 3 sets of 40 particles each, i.e. 6 combinations of sets to compute for each timestep, in each data set.

Googling led me to scipy.spatial.distance.cdist but I am having trouble optimising the computations time. The inbuilt routines for distances (Euclidean, Minkowski, Manhattan, Chebyshev etc.) are optimised and run pretty fast (upto three orders of magnitude in my tests below), in comparison to explicit function definitions given as arguments:

In [1]: import numpy as np

In [2]: from scipy.spatial.distance import cdist, euclidean

In [3]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, metric='euclidean')
   ...:
The slowest run took 12.46 times longer than the fastest. 
This could mean that an intermediate result is being cached
10000 loops, best of 3: 39.3 µs per loop

In [4]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, metric=euclidean)
   ...:
10 loops, best of 3: 43 ms per loop

In [5]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, lambda u, v: np.sqrt(((u-v)**2).sum()) )
   ...:
100 loops, best of 3: 15.5 ms per loop

In [6]: width = 1.0

In [7]: func = lambda x: x - np.rint(x/width)*width

In [8]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, lambda u, v: np.sqrt(((func(u)-func(v))**2).sum()) )
   ...:
10 loops, best of 3: 31.2 ms per loop

Here is what I have considered as options:

  • Explicitly loop over array elements and build the required array (probably the least efficient)
  • Separate array into the three x, y, x components, apply minimum image criterion, and use cdist for calculating euclidean distances for each component individually (because numpy.sqrt(dx**2) == dx and so on), reconstruct (40, 3) array from the component arrays, and repeat cdist for calculating distances in 3D

What would be an efficient way to calculate the equivalent of cdist(pos1, pos2, lambda u, v: np.sqrt(((func(u)-func(v))**2).sum()) )?


Question:

Is there any inbuilt NumPy function that can give an equivalent of [(i-j) for i in pos1 for j in pos2], but for specified axis of two arrays?

An example representation for what I intend to achieve:

    [ a  0  0 ]           [ x  0  0 ]
A = [ b  0  0 ]  ;    B = [ y  0  0 ]
    [ c  0  0 ]           [ z  0  0 ]

              [ a-x  0  0 ]
              [ a-y  0  0 ]
              [ a-z  0  0 ]
              [ b-x  0  0 ]
     Result = [ b-y  0  0 ]
              [ b-z  0  0 ]
              [ c-x  0  0 ]
              [ c-y  0  0 ]
              [ c-z  0  0 ]

(All values are float, and the operation is to be done for all columns.)

0

1 Answer 1

2

I'm not sure that I've understood exactly what you want to do... should I have, you can do an elementwise difference using broadcasting, see below

In [24]: a = np.random.random((5,3))
In [25]: b = np.random.random((5,3))
In [26]: c = a[:,None,:]-b
In [27]: c[3,4]
Out[27]: array([ 0.55732535,  0.30270483,  0.48249629])
In [28]: a[3]-b[4]
Out[28]: array([ 0.55732535,  0.30270483,  0.48249629])
In [29]: c[0,3]
Out[29]: array([ 0.28562698,  0.33227255,  0.35890964])
In [30]: a[0]-b[3]
Out[30]: array([ 0.28562698,  0.33227255,  0.35890964])
In [31]: 
Sign up to request clarification or add additional context in comments.

6 Comments

This is exactly what I want! Didn't know it was this straightforward... Now I can apply the minimum image criterion directly on the array. However, after reading the documentation, I hope I understand a[:,None,:]-b correctly: The elementwise comparison is now a[0,None] - b[0], a[1,None] - b[1] and so on. Do I understand it correctly?
It seems that a[:,None]-b also gives the same result as a[:,None,:]-b for two 2D arrays that I tried. How exactly does a[:,None,:] work?
@KrishnadevN: You can always add : to the right of an index, and it will have no effect. So for a 2d array, mat[0] is the same as mat[0,:]
@Eric : Okay, but the array is 2D. So is a[:,None,:] converting my 2D array to 3D and doing the operations?
None doesn't "use up" a dimension when indexing, so the second : is still referring to the second axis of the input
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.