Iterating with Python

Question

I have two arrays and from these two I have to create another one in this way:

  for i in arange(0,len(second_array),1):
     third_array[i] = my_function(first_array[i],second_array[i])

Here my_function is a procedure which takes two scalars as inputs and then return another scalar. My problem is that the arrays I usually work with are huge and so the above loops takes forever. Is there a way to avoid the loop but still filling the array third_array the way I want?

Fred Foo · Accepted Answer · 2012-01-09 13:36:08Z

4

Since you're using arange, I take it you're using NumPy. Try to rewrite my_function so that it takes two arrays instead of two scalar values and use vectorized operations.

answered Jan 9, 2012 at 13:36

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Pietzcker · Accepted Answer · 2012-01-09 13:35:57Z

3

How could you avoid looping if you need to access each element of both lists? I don't really understand your question.

But you can do it a bit more simply. In Python 3:

third_array = [my_function(a, b) for a, b in zip(first_array, second_array)]

In Python 2, it's better to use

from itertools import izip
third_array = [my_function(a, b) for a, b in izip(first_array, second_array)]

answered Jan 9, 2012 at 13:35

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Comments

Tim MB · Accepted Answer · 2012-01-09 15:15:51Z

1

Using map seems to be marginally quicker than a list comprehension:

import cProfile, numpy as np
from operator import add

A = np.random.rand(1000000)
B = np.random.rand(1000000)

>>> cProfile.run('C = map(add, A, B)')
         3 function calls in 0.693 seconds

>>> cProfile.run('C = [a+b for a,b in izip(A,B)]')
         2 function calls in 0.765 seconds

>>> cProfile.run('for i in np.arange(0,len(B),1): C[i] = A[i]+B[i]')
         4 function calls in 1.971 seconds

But as @larsmans says, using a vectorized solution will be much quicker:

>>> cProfile.run('C = A + B')
         2 function calls in 0.005 seconds

edited Jan 9, 2012 at 15:15

answered Jan 9, 2012 at 14:47

Tim MB

4,5515 gold badges42 silver badges51 bronze badges

Comments

NPE · Accepted Answer · 2012-01-09 13:40:21Z

1

Since you're already using NumPy, it may be worth exploring universal functions (ufunc) and numpy.frompyfunc().

In [1]: import numpy as np

In [2]: first_array = np.arange(10)

In [3]: second_array = np.arange(10, 20)

In [5]: def halfsum(a, b): return (a + b) / 2.0
   ...: 

In [7]: halfsum_ufunc = np.frompyfunc(halfsum, 2, 1)

In [8]: halfsum_ufunc(first_array, second_array)
Out[8]: array([5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0], dtype=object)

One caveat is that frompyfunc-created ufuncs always return PyObject arrays. I am not sure if there's a way around that.

answered Jan 9, 2012 at 13:40

NPE

503k114 gold badges970 silver badges1k bronze badges

Collectives™ on Stack Overflow

Iterating with Python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related