2

A question from a complete Python novice.

I have a column array where I need to force certain values to zero depending on a conditional statement applied to another array. I have found two solutions, which both provide the correct answer. But they are both quite time consuming for the larger arrays I typically need (>1E6 elements) - also I suspect that it is poor programming technique. The two versions are:

from numpy import zeros,abs,multiply,array,reshape

def testA(y, f, FC1, FC2):
    c = zeros((len(f),1))
    for n in xrange(len(f)):
        if abs(f[n,0]) >= FC1 and abs(f[n,0]) <= FC2:
            c[n,0] = 1.
    w = multiply(c,y)
    return w

def testB(y, f, FC1, FC2):
    z = [(abs(f[n,0])>=FC1 and abs(f[n,0])<=FC2) for n in xrange(len(f))]
    z = multiply(array(z,dtype=float).reshape(len(f),1), y)
    return z

The input arrays are column arrays as this matches the post processing to be done. The test can be done like:

>>> from numpy.random import normal as randn
>>> fs, N = 1.E3, 2**22
>>> f = fs/N*arange(N).reshape((N,1))
>>> x = randn(size=(N,1))
>>> w1 = testA(x,f,200.,550.)
>>> z1 = testB(x,f,200.,550.)

On my laptop testA takes 18.7 seconds and testB takes 19.3 - both for N=2**22. In testB I also tried to include "z = [None]*len(f)" to preallocate as suggested in another thread but this doesn't really make any difference.

I have two questions, which I hope to have the same answer:

  1. What is the "correct" Python solution to this problem?
  2. Is there anything I can do to get the answer faster?

I have deliberately not used any time at all using compiled Python for example - I wanted to have some working code first. Hopefully also something, which is good Python style. I hope to be able to get the execution time for N=2**22 below two seconds or so. This particular operation will be used many times so the execution time does matter.

I apologize in advance if the question is stupid - I haven't been able to find an answer in the overwhelming amount of not always easily accessible Python documentation or in another thread.

3
  • Is it required to use arrays for y, f and return value? Why not using lists instead? Commented Sep 29, 2011 at 8:19
  • In the following processing I need to do a bunch of matrix operations and I expected it to be the easiest (and what a user of the code would expect) to stay with the arrays. But if lists are better I could perhaps just transfer to arrays later. Commented Sep 29, 2011 at 8:42
  • The proposal by HYRY works great. It shaved time to below 0.4 seconds on my laptop. I am perfectly happy with this. Thanks a lot for the help! It is highly appreciated. Commented Sep 29, 2011 at 8:44

2 Answers 2

5

use bool array to access elements in array y:

def testC(y, f, FC1, FC2):
    f2 = abs(f)
    idx = (f2>=FC1) & (f2<=FC2)
    y[~idx] = 0
    return y
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, that I also modified. But it really makes a huge difference - still easy to read and it performs very well indeed. Thanks for all your comments and help!
+1: That's the standard, efficient NumPy way, as far as I know.
0

All of these are slower than HYRY solution by a large factor:

How about

( x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) )

If you need to do random access (very slow)

[ x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) ]

or you can also use map

map(lambda x: x[1] if FC1<=abs(x[0])<=FC2 else 0 , itertools,izip(f,x))

or using vectorize (faster than A and B but much much slower than C)

b1v = np.vectorize(lambda a,b: a if 200<=abs(b)<=550 else 0)
b1 = b1v(f,x)

4 Comments

NumPy functions are usually way faster than standard Python functions, on arrays. So, while this works, this does not fit perfectly with the "large array" part of the question.
Isn't map essentially doing the same as the second one you propose? I got the impression that map is essentially a generated list. But I could easily have misunderstood this.
Is this equivalent of map for numpy? docs.scipy.org/doc/numpy/reference/generated/…
Ah.. I tested them including using vectorize. HYRY version is way faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.