numpy missing attributes depending on array size

Question

Here is a small code which demonstrates the error I am getting.

import numpy as np

r=4.0
L=20.0
ratio = 4*np.pi / 3.0 * (r/L)**3

for i in range(5, 16):
    n = 10**i
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(100, size=(n,3))    
    jp = np.random.random_integers(100, size=(m,3))

    a = np.expand_dims(ip, -1) == jp.T
    b = np.where( a.all(axis=1).any(axis=1) )[0]

I get the following output:

5 100000 3351
6 1000000 33510 
Traceback (most recent call last):
  File "example.py", line 16, in <module>
    b = np.where( a.all(axis=1).any(axis=1) )[0]
AttributeError: 'bool' object has no attribute 'all'

Anyone know what is going on here?

Alternatively, a reasonably quick way of indexing the location of elements of jp in ip would also work. I may go with the second solution from here

Where i==5 the size of array a is already on the order of 1 billion elements, where i==6 its 100 billion elements (100 GB of data if bool). Are these extremely large arrays intentional? — Daniel
– Daniel, Commented Jul 26, 2013 at 18:49
The size is intentional, they are for a quite dense computational grid. At i==6 np.expand_dims(ip, -1).nbytes == 24000000 which should be within reason, no? — user1984528
– user1984528, Commented Jul 26, 2013 at 21:02
Yes that is reasonable, but you are broadcasting this against jp.T to give the above sizes for a. print a.nbytes/1E9 for the size in GB. — Daniel
– Daniel, Commented Jul 26, 2013 at 21:06
I see, you're correct. I think instead I will use another approach (now linked in the question) which should avoid that memory issue. Thanks — user1984528
– user1984528, Commented Jul 26, 2013 at 21:14

Daniel · Accepted Answer · 2013-08-14 23:59:06Z

You are broadcasting ip against jp creating extremely large arrays. When i==6 you have a 100GB array.

A solution is to loop over the array:

for i in range(2,6):
    t=time.time()
    n = 10**i+1
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(10, size=(n,3))
    jp = np.random.random_integers(10, size=(m,3))

    chunksize=10000
    if chunksize>ip.shape[0]:
        chunksize=ip.shape[0]
    span=ip.shape[0]/chunksize
    remainder=(ip.shape[0]-span*chunksize)

    out=[]
    start=0
    for n in xrange(span):
        end=start+chunksize
        a = np.expand_dims(ip[start:end], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+start)
        start+=chunksize

    if remainder!=0:
        a = np.expand_dims(ip[-remainder:], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+end)

    end=np.sort(np.concatenate(out))
    print time.time()-t,end.shape

Time is about 10 seconds for i==6 so i==7 will take about 20 minutes.

Collectives™ on Stack Overflow

numpy missing attributes depending on array size

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related