1

I'm slowly switching to Python and I wanted to make a simple test for comparing the performance of a simple array summation. I generate a random 1000x1000 array and add one to each of the values in this array.

Here my script in Python :

import time

import numpy
from numpy.random import random

def testAddOne(data):
    """
    Test addOne
    """
    return data + 1

i = 1000
data = random((i,i))
start = time.clock()
for x in xrange(1000): 
    testAddOne(data)

stop = time.clock()
print stop - start

And my function in MATLAB:

function test
%parameter declaration
c=rand(1000);

tic
for t = 1:1000
    testAddOne(c);
end
fprintf('Structure: \n')
toc
end

function testAddOne(c)
c = c + 1;
end

The Python takes 2.77 - 2.79 seconds, the same as the MATLAB function (I'm actually quite impressed by Numpy!). What would I have to change to my Python script to use multithreading? I can't in MATLAB since I don,t have the toolbox.

6
  • Is that really fair on MATLAB, to add one element at a time, when you can do in one go and because that's where the power of MATLAB lies in? If you inline the function, I won't be surprised if you get some appreciable improvement with MATLAB. Commented Apr 4, 2014 at 13:31
  • 1
    @Divakar You should check the code again, MATLAB is adding the ones in a single call. What is probably misleading you is that I'm running this 1000 times, which also correspond to the length of the array. To understand, the for loop t = 1:1000 could be t = 1:randi([1000,2000]) Commented Apr 4, 2014 at 13:39
  • 1
    That's right, hence why my function is using the MATLAB vectorized approach. If I didn't do this, I would have to use two for loop statement i and j and going through each index to add one. Which like you said, wouldn't use the power of MATLAB. Commented Apr 4, 2014 at 13:57
  • 1
    Looks like a fair comparison. Commented Apr 4, 2014 at 14:09
  • 2
    Some relevant discussion here, here and here. Commented Apr 4, 2014 at 14:26

1 Answer 1

2

Multi threading in Python is only useful for situations where threads get blocked, e.g. on getting input, which is not the case here (see the answers to this question for more details). However, multi processing is easy to do in Python. Multiprocessing in general is covered here.

A program taking a similar approach to your example is below

import time
import numpy
from numpy.random import random
from multiprocessing import Process

def testAddOne(data):
    return data + 1

def testAddN(data,N):
    # print "testAddN", N
    for x in xrange(N): 
        testAddOne(data)

if __name__ == '__main__':
    matrix_size = 1000
    num_adds = 10000
    num_processes = 4

    data = random((matrix_size,matrix_size))

    start = time.clock()
    if num_processes > 1:
        processes = [Process(target=testAddN, args=(data,num_adds/num_processes))
                     for i in range(num_processes)]
        for p in processes:
            p.start()
        for p in processes:
            p.join()
    else:
        testAddN(data,num_adds)

    stop = time.clock()
    print "Elapsed", stop - start

A more useful example using a pool of worker processes to successively add 1 to different matrices is below.

import time
import numpy
from numpy.random import random
from multiprocessing import Pool

def testAddOne(data):
    return data + 1

def testAddN(dataN):
    data,N=dataN
    for x in xrange(N): 
        data = testAddOne(data)
    return data

if __name__ == '__main__':
    num_matrices = 4
    matrix_size = 1000
    num_adds_per_matrix = 2500

    num_processes = 4

    inputs = [(random((matrix_size,matrix_size)), num_adds_per_matrix)
              for i in range(num_matrices)]
    #print inputs # test using, e.g., matrix_size = 2

    start = time.clock()

    if num_processes > 1:
        proc_pool = Pool(processes=num_processes)
        outputs = proc_pool.map(testAddN, inputs)    
    else:
        outputs = map(testAddN, inputs)

    stop = time.clock()
    #print outputs # test using, e.g., matrix_size = 2
    print "Elapsed", stop - start

In this case the code in testAddN actually does something with the result of calling testAddOne. And you can uncomment the print statements to check that some useful work is being done.

In both cases I've changed the total number of additions to 10000; with fewer additions the cost of starting up processes becomes more significant (but you can experiment with the parameters). And you can experiment with num_processes also. On my machine I found that compared to running in the same process with num_processes=1 I got just under a 2x speedup spawning four processes with num_processes=4.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm trying your first example right now, and I got a 2.16x speedup (with 4 processes)
Still trying to understand what you are doing in the second example! But thanks for both solutions!
@m_power thanks for the feedback! In computing map is used to mean take one list L, process each element in the list with some function f, producing another list L'. The list L=[x0, x1, ..., xn] becomes L=[f(x0), f(x1), ..., f(xn)]. What I really like about the Python process pool is that a multi-process map looks almost exactly the same as the single-process map.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.