Numpy/Python performing terribly vs. Matlab

Question

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.

My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.

I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.

Here is the python code for the main loop:

for basecell in range (0, cellnumber-1):
    if firstcelltype == np.array((cellrecord[basecell,2])):
        xloc=np.array((cellrecord[basecell,0]))
        yloc=np.array((cellrecord[basecell,1]))
        xedgedist=(xbound-xloc)
        yedgedist=(ybound-yloc)
        if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and    yedgedist>excludedist:
            for comparecell in range (0, cellnumber-1):
                if secondcelltype==np.array((cellrecord[comparecell,2])):
                    xcomploc=np.array((cellrecord[comparecell,0]))
                    ycomploc=np.array((cellrecord[comparecell,1]))
                    dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
                    dist=round(dist)
                    if dist>=1 and dist<=analysisdist:
                         arraytarget=round(dist*analysisdist/intervalnumber)
                         addone=np.array((spatialraw[arraytarget-1]))
                         addone=addone+1
                         targetcell=arraytarget-1
                         np.put(spatialraw,[targetcell,targetcell],addone)

Here is the matlab code for the main loop:

for basecell = 1:cellnumber;
    if firstcelltype==cellrecord(basecell,3);
         xloc=cellrecord(basecell,1);
         yloc=cellrecord(basecell,2);
         xedgedist=(xbound-xloc);
         yedgedist=(ybound-yloc);
         if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
             for comparecell = 1:cellnumber;
                 if secondcelltype==cellrecord(comparecell,3);
                     xcomploc=cellrecord(comparecell,1);
                     ycomploc=cellrecord(comparecell,2);
                     dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
                     if (dist>=1) && (dist<=100.4999);
                         arraytarget=round(dist*analysisdist/intervalnumber);
                         spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
                    end
                end
            end            
        end
    end
end

Thanks!

Are you certain that your two routines are giving the same results (i.e. that they are both performing the computation correctly)? — gnovice
– gnovice, Commented Sep 28, 2010 at 19:26
Yep, I'm checking the spatialsum/spatialraw (just noticed I named them differently.) They're both adding correctly. — Nissl
– Nissl, Commented Sep 28, 2010 at 22:07
BUG: Use range(cellnumber), python excludes the upper limit. — Beni Cherniavsky-Paskin
– Beni Cherniavsky-Paskin, Commented Sep 28, 2010 at 22:27

Justin Peel · Accepted Answer · 2011-01-15 07:47:40Z

27

Here are some ways to speed up your python code.

First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,

if firstcelltype == np.array((cellrecord[basecell,2])):

can just be

 if firstcelltype == cellrecord[basecell,2]:

I'll show you why with some timeit statements:

>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071

That's an order of magnitude in difference between those calls.

Second: The following code:

arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)

can be replaced with

arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1

Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.

Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:

comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]

and then replace

for comparecell in range (0, cellnumber-1):
    if secondcelltype==np.array((cellrecord[comparecell,2])):

with

for comparecell in comparecells:

Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.

Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:

basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
    dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
    whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
    arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
    for target in arraytargets:
        spatialraw[target] += 1

You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.

edited Jan 15, 2011 at 7:47

answered Sep 28, 2010 at 19:24

Justin Peel

47.1k6 gold badges62 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Justin Peel Over a year ago

Also, if aren't running your code inside of a function, then that should give you a speed-up as well. Alex Martelli mentions this in many of his posts and I've found it to be quite true.

Nissl Over a year ago

So far tested 1-4 and Justin, Juri and Philip's suggestions and have a ~3x speed increase. Going to have a look at pysco now. Thanks for all the suggestions guys.

Justin Peel Over a year ago

@Nissl, does that mean that you were able to get it down to 5 seconds from 20 with the range -> xrange change as well? Do you mind sharing how low you get it down to in the end? I'm just curious.

JudoWill · Accepted Answer · 2010-09-28 18:13:13Z

2

Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:

for basecell = 1:cellnumber;
    if firstcelltype==cellrecord(basecell,3);
         xloc=cellrecord(basecell,1);
         yloc=cellrecord(basecell,2);
         xedgedist=(xbound-xloc);
         yedgedist=(ybound-yloc);
         if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
%             for comparecell = 1:cellnumber;
%                 if secondcelltype==cellrecord(comparecell,3);
%                     xcomploc=cellrecord(comparecell,1);
%                     ycomploc=cellrecord(comparecell,2);
%                     dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
%                     if (dist>=1) && (dist<=100.4999);
%                         arraytarget=round(dist*analysisdist/intervalnumber);
%                         spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
%                    end
%                end
%            end
         %replace with:
        secondcelltype_mask = secondcelltype == cellrecord(:,3);
        xcomploc_vec = cellrecord(secondcelltype_mask ,1);
                ycomploc_vec = cellrecord(secondcelltype_mask ,2);
                dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
                dist_mask = dist>=1 & dist<=100.4999
                arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
                count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
                spatialsum(:,1) = spatialsum(:,1)+count;
        end
    end
end

There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.

From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.

answered Sep 28, 2010 at 18:13

JudoWill

4,8312 gold badges38 silver badges49 bronze badges

3 Comments

MatlabDoug Over a year ago

For loops are not necessarily a problem. This was true pre-JIT (Just In Time Compiler). Use profiler to find problems. NOTE: I am not saying this solution is good or bad, just general advice on For Loops.

Nissl Over a year ago

I just tested this one, had to do a little finagling to get the code working with the rest of my parameter but I think it's pretty close. Unfortunately it dropped me to about 0.5x speed. Perhaps that has something to do with the JIT handling for loops well (I'm running 2009b)?

JudoWill Over a year ago

I stand corrected then ... although most of my Matlab hacking is on an old version pre-JIT.

Philip Starhill · Accepted Answer · 2010-09-28 18:49:11Z

0

You can avoid some of the math.sqrt calls by replacing the lines

                dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
                dist=round(dist)
                if dist>=1 and dist<=analysisdist:
                     arraytarget=round(dist*analysisdist/intervalnumber)

with

                dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
                dist=round(dist)
                if dist>=1 and dist<=analysisdist_squared:
                     arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)

where you have the line

 analysisdist_squared = analysis_dist * analysis_dist

outside of the main loop of your function.

Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.

~~I would also try replacing~~

dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2

with

dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)

~~There's a chance it will produce faster byte code to do multiplication rather than exponentiation.~~

I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.

edited Sep 28, 2010 at 18:49

answered Sep 28, 2010 at 17:42

Philip Starhill

2051 silver badge5 bronze badges

1 Comment

kennytm Over a year ago

In Python a ** 2 is often faster than a * a. Try timeit.

Juri Robl · Accepted Answer · 2010-09-28 19:17:03Z

0

If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.

Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

edited Sep 28, 2010 at 19:17

answered Sep 28, 2010 at 19:01

Juri Robl

5,7683 gold badges33 silver badges47 bronze badges

Collectives™ on Stack Overflow

Numpy/Python performing terribly vs. Matlab

4 Answers 4

3 Comments

3 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related