Python: Elementwise difference for multidimensional array

Question

I am trying to find the minimum image separation between two sets of particles. I have about 40 particles in each set, and their position vectors (three dimensional) are stored in two arrays of dimension (40, 3). I have to compute the Euclidean distance between each of the particles in one set, and each of the particles in the other, after applying the minimum image criterion. To make it clearer, a one-dimensional equivalent of the same would be, for two lists pos1 and pos2 with the coordinates, [func(i-j) for i in pos1 for j in pos2] where func = lambda x: x - np.rint(x/width)*width is a function which applies the minimum criterion.

In three dimensions, the Euclidean distance would be numpy.sqrt(dx**2 + dy**2 + dx**2) where dx, dy, and dz are returned by func for each dimension.

(The function func is just to demonstrate how minimum image criterion is applied. I do not use the same exact program structure.)

I am looking for an efficient way to do this, as I have to do the same operations as part of analysing multiple data sets, each with about 20000 time steps, and each time step containing 3 sets of 40 particles each, i.e. 6 combinations of sets to compute for each timestep, in each data set.

Googling led me to scipy.spatial.distance.cdist but I am having trouble optimising the computations time. The inbuilt routines for distances (Euclidean, Minkowski, Manhattan, Chebyshev etc.) are optimised and run pretty fast (upto three orders of magnitude in my tests below), in comparison to explicit function definitions given as arguments:

In [1]: import numpy as np

In [2]: from scipy.spatial.distance import cdist, euclidean

In [3]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, metric='euclidean')
   ...:
The slowest run took 12.46 times longer than the fastest. 
This could mean that an intermediate result is being cached
10000 loops, best of 3: 39.3 µs per loop

In [4]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, metric=euclidean)
   ...:
10 loops, best of 3: 43 ms per loop

In [5]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, lambda u, v: np.sqrt(((u-v)**2).sum()) )
   ...:
100 loops, best of 3: 15.5 ms per loop

In [6]: width = 1.0

In [7]: func = lambda x: x - np.rint(x/width)*width

In [8]: %%timeit
   ...: pos1 = np.random.rand(40, 3)
   ...: pos2 = np.random.rand(40, 3)
   ...: cdist(pos1, pos2, lambda u, v: np.sqrt(((func(u)-func(v))**2).sum()) )
   ...:
10 loops, best of 3: 31.2 ms per loop

Here is what I have considered as options:

Explicitly loop over array elements and build the required array (probably the least efficient)
Separate array into the three x, y, x components, apply minimum image criterion, and use cdist for calculating euclidean distances for each component individually (because numpy.sqrt(dx**2) == dx and so on), reconstruct (40, 3) array from the component arrays, and repeat cdist for calculating distances in 3D

What would be an efficient way to calculate the equivalent of cdist(pos1, pos2, lambda u, v: np.sqrt(((func(u)-func(v))**2).sum()) )?

Question:

Is there any inbuilt NumPy function that can give an equivalent of [(i-j) for i in pos1 for j in pos2], but for specified axis of two arrays?

An example representation for what I intend to achieve:

    [ a  0  0 ]           [ x  0  0 ]
A = [ b  0  0 ]  ;    B = [ y  0  0 ]
    [ c  0  0 ]           [ z  0  0 ]

              [ a-x  0  0 ]
              [ a-y  0  0 ]
              [ a-z  0  0 ]
              [ b-x  0  0 ]
     Result = [ b-y  0  0 ]
              [ b-z  0  0 ]
              [ c-x  0  0 ]
              [ c-y  0  0 ]
              [ c-z  0  0 ]

(All values are float, and the operation is to be done for all columns.)

gboffi · Accepted Answer · 2017-11-05 21:43:13Z

2

I'm not sure that I've understood exactly what you want to do... should I have, you can do an elementwise difference using broadcasting, see below

In [24]: a = np.random.random((5,3))
In [25]: b = np.random.random((5,3))
In [26]: c = a[:,None,:]-b
In [27]: c[3,4]
Out[27]: array([ 0.55732535,  0.30270483,  0.48249629])
In [28]: a[3]-b[4]
Out[28]: array([ 0.55732535,  0.30270483,  0.48249629])
In [29]: c[0,3]
Out[29]: array([ 0.28562698,  0.33227255,  0.35890964])
In [30]: a[0]-b[3]
Out[30]: array([ 0.28562698,  0.33227255,  0.35890964])
In [31]:

answered Nov 5, 2017 at 21:43

gboffi

25.4k10 gold badges62 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Krishnadev N Over a year ago

This is exactly what I want! Didn't know it was this straightforward... Now I can apply the minimum image criterion directly on the array. However, after reading the documentation, I hope I understand a[:,None,:]-b correctly: The elementwise comparison is now a[0,None] - b[0], a[1,None] - b[1] and so on. Do I understand it correctly?

Krishnadev N Over a year ago

It seems that a[:,None]-b also gives the same result as a[:,None,:]-b for two 2D arrays that I tried. How exactly does a[:,None,:] work?

Eric Over a year ago

@KrishnadevN: You can always add : to the right of an index, and it will have no effect. So for a 2d array, mat[0] is the same as mat[0,:]

Krishnadev N Over a year ago

@Eric : Okay, but the array is 2D. So is a[:,None,:] converting my 2D array to 3D and doing the operations?

Eric Over a year ago

None doesn't "use up" a dimension when indexing, so the second : is still referring to the second axis of the input

|

Collectives™ on Stack Overflow

Python: Elementwise difference for multidimensional array

Question:

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Question:

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related