1

I have two arrays and from these two I have to create another one in this way:

  for i in arange(0,len(second_array),1):
     third_array[i] = my_function(first_array[i],second_array[i])

Here my_function is a procedure which takes two scalars as inputs and then return another scalar. My problem is that the arrays I usually work with are huge and so the above loops takes forever. Is there a way to avoid the loop but still filling the array third_array the way I want?

0

4 Answers 4

4

Since you're using arange, I take it you're using NumPy. Try to rewrite my_function so that it takes two arrays instead of two scalar values and use vectorized operations.

Sign up to request clarification or add additional context in comments.

Comments

3

How could you avoid looping if you need to access each element of both lists? I don't really understand your question.

But you can do it a bit more simply. In Python 3:

third_array = [my_function(a, b) for a, b in zip(first_array, second_array)]

In Python 2, it's better to use

from itertools import izip
third_array = [my_function(a, b) for a, b in izip(first_array, second_array)]

Comments

1

Using map seems to be marginally quicker than a list comprehension:

import cProfile, numpy as np
from operator import add

A = np.random.rand(1000000)
B = np.random.rand(1000000)

>>> cProfile.run('C = map(add, A, B)')
         3 function calls in 0.693 seconds

>>> cProfile.run('C = [a+b for a,b in izip(A,B)]')
         2 function calls in 0.765 seconds

>>> cProfile.run('for i in np.arange(0,len(B),1): C[i] = A[i]+B[i]')
         4 function calls in 1.971 seconds

But as @larsmans says, using a vectorized solution will be much quicker:

>>> cProfile.run('C = A + B')
         2 function calls in 0.005 seconds

Comments

1

Since you're already using NumPy, it may be worth exploring universal functions (ufunc) and numpy.frompyfunc().

In [1]: import numpy as np

In [2]: first_array = np.arange(10)

In [3]: second_array = np.arange(10, 20)

In [5]: def halfsum(a, b): return (a + b) / 2.0
   ...: 

In [7]: halfsum_ufunc = np.frompyfunc(halfsum, 2, 1)

In [8]: halfsum_ufunc(first_array, second_array)
Out[8]: array([5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0], dtype=object)

One caveat is that frompyfunc-created ufuncs always return PyObject arrays. I am not sure if there's a way around that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.