How to optimise multi dimension numpy array calculation?

Question

Given a 5D array, the objective is to calculate the difference between the extracted two arrays. For simplicity, say to measure the difference at the second position which can be denoted as bt and lf. The value from these two array can be extracted as follows:

arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :]

Note that in the above, the index for the first (ep), third (mt) and fourth (bd) axes are the same for both of the arrays, with only the position index of second axis differing (bt and lf).

Based on this requirement, the following code was proposed, and pack under the function nested_for_loop:

import numpy as np
from joblib import Parallel, delayed
np.random.seed(0)

ub_lb_pair = np.tril_indices (5, -1)

arr = np.random.randn(3, 5, 4, 3, 2)
my_shape = arr.shape

def nested_for_loop():
    store_cval = np.full([my_shape[0], 10, my_shape[2], my_shape[3], my_shape[4]],
                         np.nan)  # preallocate
    for ep in range(0, my_shape[0]):
        for mt in range(0, my_shape[2]):
            for bd in range(0, my_shape[3]):
                for idx,(bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                    store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - \
                                                     arr[ep, lf, mt, bd, :]
    return store_cval


store_cval = nested_for_loop()

However, I would like to make the code much more compact and efficient if possible.

One approach I can think of is take advantage of the joblib parallel module, which can be achieve as below as shown under the function multi_prop.

def multi_prop(my_arr, ep):
    temp_ = np.full([10, my_shape[2], my_shape[3], my_shape[4]],
                    np.nan)
    for mt in range(0, my_shape[2]):
        for bd in range(0, my_shape[3]):
            for idx, (bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                temp_[idx, mt, bd, :] = my_arr[ep, bt, mt, bd, :] - my_arr[ep, lf, mt, bd, :]
                x = 1
    return  temp_

dist_abs = Parallel(n_jobs=-1)(delayed(multi_prop)(arr, ep) for ep in range(0, my_shape[0]))

dist_abs = np.array(dist_abs)
bb = np.array_equal(store_cval, dist_abs)

But, I wonder whether the is a more numpythonic way to achieve the same objective.

Why a slice for just the last dimension? arr[:,bt,:,:,:] and arr[:,lf,:,:,:] or even arr[:,bt,...] and arr[:,lf,...]? — hpaulj
– hpaulj, Commented Feb 3, 2021 at 17:35
Say the last last dimension is of shape 2 and, and each of having the value [3,4] and [1,1] . So the diffrence between them are [2,3]. Does this make sense to you @hpaulj? — rpb
– rpb, Commented Feb 3, 2021 at 17:47

Mad Physicist · Accepted Answer · 2021-02-04 13:32:51Z

2

You don't really need any loops at all. Imagine this pair of fancy indices:

bt, lf = np.tril_indices (5, -1)

You are looking for

store_cval = arr[:, bt] - arr[:, lf]

Keep in mind that store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :] is an implicit loop over the last index. They're all loops, and you don't need any of them over the hood.

A more general solution:

def diffs(arr, axis):
    a, b = np.tril_indices(arr.shape[axis], -1)
    ind1 = [slice(None) for _ in range(arr.ndim)]
    ind2 = ind1.copy()
    ind1[axis] = a
    ind2[axis] = b
    return arr[tuple(ind1)] - arr[tuple(ind2)]

edited Feb 4, 2021 at 13:32

answered Feb 3, 2021 at 18:07

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

rpb Over a year ago

Hi @Mad, thanks for the detail explaination. For some reason, I am unable to reproduce the above result. For example, what is the value of axis. Also, my compiler provide the warning End of statement expected on the line ind1 = [slice(None) for _ in range arr.ndim]. And finally, do you mean tril to be np.tril_indices ?

Mad Physicist Over a year ago

@balandongiv. Sorry about the typos. Fixed now. In your case, you want to use combinations along the second axis, so axis=1

Collectives™ on Stack Overflow

How to optimise multi dimension numpy array calculation?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related