1

I'm a researcher working with climate model output using Python to find certain types of storms. I have 8 large numpy arrays (dimensions are 109574 x 52 x 57). These arrays are filled with 1's to signify there was a storm on that day (first dimension is time), 0 for no storm. The other two dimensions are latitude and longitude.

I have to eliminate back-to-back days from these arrays. For example, if there was a storm on day 1 and day 2, I would like to only count 1 storm. If there was a storm Day 1, 2, and 3, I would like to only count 1 and 3 for a total of two storms, Day 1-4 would have 2 storms, and so on. I found the # of storms at the end using np.sum to tally up the 1's in the array along the time axis.

I am running the following code to achieve this, but I am faced with the issue that it is extremely slow. Because I will have to repeat this procedure for other datasets, I was wondering if there was a way to speed this process up to be more efficient. I have my code below, and I would be more than happy to clarify anything.

# If there is a storm that overlaps two two-day periods, only count it once
print("Eliminating doubles...")
for i in range(52):
    for j in range(57):
        print(i,j)
        for k in range(109573):
            if((storms1[k,i,j]) == 1 and (storms1[k+1,i,j] == 1)):
                storms1[k,i,j] = 0
            if((storms2[k,i,j]) == 1 and (storms2[k+1,i,j] == 1)):
                storms2[k,i,j] = 0
            if((storms3[k,i,j]) == 1 and (storms3[k+1,i,j] == 1)):
                storms3[k,i,j] = 0
            if((storms4[k,i,j]) == 1 and (storms4[k+1,i,j] == 1)):
                storms4[k,i,j] = 0
            if((storms5[k,i,j]) == 1 and (storms5[k+1,i,j] == 1)):
                storms5[k,i,j] = 0
            if((storms6[k,i,j]) == 1 and (storms6[k+1,i,j] == 1)):
                storms6[k,i,j] = 0
            if((storms7[k,i,j]) == 1 and (storms7[k+1,i,j] == 1)):
                storms7[k,i,j] = 0
            if((storms8[k,i,j]) == 1 and (storms8[k+1,i,j] == 1)):
                storms8[k,i,j] = 0

Before someone suggests iterating through the arrays with a loop, I changed the variable names to simplify them for the purpose of asking this question.

Thanks for the help.

0

3 Answers 3

2

Here is a vectorised function that can replace your innermost loop:

def do(KK):
    # find stretches of ones
    switch_points = np.where(np.diff(np.r_[0, KK, 0]))[0]
    switch_points.shape = -1, 2
    # isolate stretches starting on odd days and create mask
    odd_starters = switch_points[switch_points[:, 0] % 2 == 1, :]
    odd_mask = np.zeros((KK.shape[0] + 1,), dtype=KK.dtype)
    odd_mask[odd_starters] = 1, -1
    odd_mask = np.add.accumulate(odd_mask[:-1])
    # apply global 1,0,1,0,1,0,... mask
    KK[1::2] = 0
    # invert stretches starting on odd days
    KK ^= odd_mask

call it from within the outer pair of loops (i and j):

do(storms1[:, i, j])
do(storms2[:, i, j])
etc.

It will change the arrays in-place.

This should be much faster than looping (the two outer loops don't make a difference).

How it works:

It finds the starting and endpoints of blocks of ones. We know that in each such block every other one has to be zerod. Using a global 1,0,1,0,1,0,... mask the algorithm zeros out every other day.

That produces

  • the correct result in blocks that start on even days
  • no change outside blocks
  • and the complement of the correct pattern in blocks that start on odd days

The last step of the algorithm is to invert these odd-starting blocks.

Sign up to request clarification or add additional context in comments.

Comments

2

An example using a 1D array that simulates your first axis. First, find where the groups of 1's start. Next, find the length of each group. Finally, calculate the number of events based on your logic:

import numpy

a = numpy.random.randint(0,2,20)

# Add an initial 0
a1 = numpy.r_[0, a]

# Mark the start of each group of 1's
d1 = numpy.diff(a1) > 0

# Indices of the start of groups of 1's
w1 = numpy.arange(len(d1))[d1]

# Length of each group
cs = numpy.cumsum(a)
c = numpy.diff(numpy.r_[cs[w1], cs[-1]+1])

# Apply the counting logic
storms = c - c//2

print(a)
>>> array([0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])
print(c)
>>> array([1, 2, 4, 1, 3])
print(storms)
>>> array([1, 1, 2, 1, 2])

You could save more memory than what I show here by reusing variable names after they are no longer needed, etc.

Comments

0

So I think you want:

storms_in[:,i,j] = [0,0,1,1,0,1,1,1,0,1,0,1,1,1,1,0]
storms_out[:,i,j]= [0,0,1,0,0,1,0,1,0,1,0,1,0,0,1,0]

This is not what your code sample is doing, but is what you say you want to do in your second paragraph.

To do this, you'd need two steps

def storms_disc(storms):  # put the whole array here, boolean-safe
    z = np.zeros((1,) + storms.shape[1:]) # zero-pads for the ends
    changes = np.r_[storms.astype('int8') ,z] - np.r_[z, storms.astype('int8')]  #find where the weather changes
    changes=((changes[:-1] == 1) | (changes[1:] == -1)).astype('int8') # reduce dimension
    return ((np.r_[changes, z] - np.r_[z, changes])[:-1] == 1).astype(storms.dtype) #find the first of successive changes

This vectorizes the whole process, and you'd only need to call it 8 times. The astype calls are because subtracting booleans causes an error, even though their value is 1 and 0

Testing:

storms=np.random.randint(0,2,90).reshape(10,3,3)
storms.T

array([[[1, 0, 0, 1, 1, 1, 1, 1, 1, 0],
        [0, 0, 1, 1, 0, 1, 1, 0, 0, 1],
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]],

       [[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
        [0, 1, 0, 0, 1, 1, 1, 0, 0, 0],
        [0, 1, 0, 0, 1, 0, 1, 0, 1, 1]],

       [[0, 1, 0, 1, 0, 1, 1, 0, 0, 0],
        [0, 1, 0, 1, 0, 1, 0, 0, 1, 1],
        [0, 0, 0, 1, 1, 1, 0, 0, 1, 0]]], dtype=int8)

storms_disc(storms).T

array([[[1, 0, 0, 1, 0, 0, 0, 0, 1, 0],
        [0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]],

       [[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
        [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
        [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]],

       [[0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
        [0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
        [0, 0, 0, 1, 0, 1, 0, 0, 1, 0]]], dtype=int8)

3 Comments

Note that you can view (a.view(numpy.bool)) an 8-bit int array as boolean, because Numpy boolean types are also 8 bit. This saves the type conversion.
Not sure where I'd do that. I'm mostly converting to int8 so I can subtract. I suppose I could replace the .astype(storms.dtype) at the end.
The "view" trick works the other way around too... But yes, I read your code a little bit too fast.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.