I've been writing Matlab code for many years and recently I have started writing in python. Let me try to explain the problem I am facing:
Some part of my code associates cells in a large array, let's say for the sake of the example an image of size 1080x1400, to a smaller array, a grid of size 770x700. All the cells in the large array could be associated with the whole grid or to a smaller section, meaning that a large number of cells in the large array could be associated with the same cell in the small array. I have written two sets of code, one in Matlab and the other in Python.
For some reason, the Matlab code runs in an average of 41 msec, while the Python code runs in an average of 4.1 sec in Pycharm (both measured 100 times). Is there anything I can do to substantially improve Numpy's performance?
Although I always write in a vectorized form, in this case, the code is written with a for loop, which I think is appropriate here.
Thanks
Links to Example Input Data:
Matlab Code:
%%
clear;clc;
InputCoord = readmatrix('InputCoord.csv');
%%
Wx = InputCoord(:,3)' + 1;
Wy = InputCoord(:,4)' + 1;
OutMtx = zeros(770,770);
%%
fp_Row = InputCoord(:,1)' + 1;
fp_Col = InputCoord(:,2)' + 1;
DataMtx = single(imread('DataMtx.tif'))./255;
%%
number_of_times = 100;
t_stop = zeros(number_of_times,1);
for jj = 1:number_of_times
N = 1;
t_start = tic;
for ii = 1:size(Wx,2)
Wx_ind = Wx(ii);
Wy_ind = Wy(ii);
fp_Row_ind = fp_Row(ii);
fp_Col_ind = fp_Col(ii);
if ii>1 && (Wx(ii)~=Wx(ii-1) || Wy(ii)~=Wy(ii-1))
N = 1;
end
OutMtx(Wx_ind, Wy_ind) = ((N-1)*OutMtx(Wx_ind, Wy_ind) + DataMtx(fp_Row_ind, fp_Col_ind))/N;
N = N + 1;
end
t_stop(jj) = toc(t_start);
end
Python Code:
import numpy as np
import cv2
import time
InputCoord = np.genfromtxt('InputCoord.csv', delimiter=',')
number_of_coords = np.shape(InputCoord)[0]
Wx = InputCoord[:, 2].astype(dtype=np.int32).reshape((1, number_of_coords))
Wy = InputCoord[:, 3].astype(dtype=np.int32).reshape((1, number_of_coords))
OutMtx = np.zeros((770, 770))
fp_Row = InputCoord[:, 0].astype(dtype=np.int32).reshape((1, number_of_coords))
fp_Col = InputCoord[:, 1].astype(dtype=np.int32).reshape((1, number_of_coords))
DataMtx = cv2.imread('DataMtx.tif', -1).astype(dtype=np.float32) / 255
# print(f' DataMtx flags:{DataMtx.flags}')
DataMtxf = np.asarray(DataMtx, order='F')
number_of_times = 100
t_stop = np.zeros((1, number_of_times))
for jj in range(number_of_times):
t_start = time.time()
N = 1
for ii in range(number_of_coords):
Wx_ind = Wx[0, ii]
Wy_ind = Wy[0, ii]
fp_Row_ind = fp_Row[0, ii]
fp_Col_ind = fp_Col[0, ii]
if (ii > 1) and ((Wx[0, ii] != Wx[0, ii - 1]) or (Wy[0, ii] != Wy[0, ii - 1])):
N = 1
OutMtx[Wx_ind, Wy_ind] = ((N - 1) * OutMtx[Wx_ind, Wy_ind] + DataMtx[fp_Row_ind, fp_Col_ind]) / N
N = N + 1
t_stop[0, jj] = time.time() - t_start
print(f'mean update time = {np.mean(t_stop)}')
Wx,Wy,fp_Row,fp_Col,DataMtxf. Does your Python code do what you want?iiiteration.numpydoes not do this. Vectorize where possible (like I did in MATLAB years ago). Or usenumbato create a compiled version.iiloop, with minimal reproducible example values. Innumpythings lilkereshape((1, number_of_coords))andWx[0, ii]look like carryovers from MATLAB. They don't hurt performance, but they clutter the code. But iterative nature ofNmay be the biggest obstacle to speeding up code by using whole-arraynumpyoperations ("vectorization"). I don't have a clear sense of what's happening with that.shared in links- the example data in your minimal reproducible example should be in the question, we should not have to get it from an offsite resource. I concur with @hpaulj 's comments.