1

This is a simple question, but I couldn't find a 'best practice', and wondered if there was something simpler than lots of if statements. Say I have a method which takes in a variable 'data', which is of shape (N,M), where N can vary. Specifically sometimes data is a 1D array of shape (M,), other times N=100 and data is shape (100,M) for example.

Below is skeleton code of what the method does, for when N>1. How can I adopt this for a general case, when N>1 or N can equal 1 (or preferably when data.shape=(M,), not just (1,M)) ? I can put lots of if statements, but I was hoping for a cleaner solution.

#start with variables data.shape=(N,M), vol.shape=(M,), jstarts and jends .shape=(4,)
N=3
#N=1 #uncomment to test
M=20
jstarts = np.array([0,5,12,15])
jends = np.array([3,10,14,18])
data = np.arange(0,N,M).reshape(N,M)
data_new = np.empty((N,M))

for i in range(0,N):
    for j in range(0,jstarts.size):
        jstart = jstarts[j]
        jend = jends[j]
        tmp = np.sum(data[i,jstart:jend]*vol[jstart:jend])/np.sum(vol[jstart:jend])
        data_new[i,jstart:jend] = tmp

*NOTE: jstart and jend depend on j, but don't depend on i

9
  • for i in range(...): for j in range(...): do_sth() is not idiomatic and numpy's special strength is that is makes operations on slices fast and expressive. Commented Nov 9, 2015 at 18:30
  • I simplified the code, and added details. This is simplified from the actual code (jstart and jend are replacements for a complicated np.where()). I know range for loops aren't very Pythonic. Maybe I should just refactor. The updated code I think can be done simply with einsum, and if statements for the string based on N==1 or not. Commented Nov 9, 2015 at 18:59
  • There's still a nested loop. Commented Nov 9, 2015 at 19:03
  • Range for loops are very Pythonic. It's numpy that provides alternatives, usually for things that can be done in parallel. But some problems are inherently serial. Commented Nov 9, 2015 at 19:03
  • Without sample values - N,M,data,vol,jstart,jend - I think this question should be closed. If you want help, make it easy to understand and test. Commented Nov 9, 2015 at 19:39

1 Answer 1

2

It's been a while since I've spent a ton of time with numpy, but IIRC, you should be able to drop at least the outer loop by using Ellipsis:

for j in range(10):
    jstart = jstarts[j]
    jend = jends[j]
    tmp = np.sum(data[...,jstart:jend]*vol[jstart:jend], axis=1)/np.sum(vol[jstart:jend])
    data_new[..., jstart:jend] = tmp
Sign up to request clarification or add additional context in comments.

6 Comments

Ah, I knew about the Ellipsis, but didn't realize np.sum() could operate like that (I thought I would have to use einsum). I think this solves the immediate problem, thanks.
Or, perhaps len(data.shape) - 1?
This actually doesn't work as is, in the original post, the np.sum() produces a single value, which is replicated into the data_new array in indices jstart:jend. With the ellipsis, the tmp should produce a 1d, shape [Nt,1] array, which would then be put into data_new in a space of shape [Nt,jend-jstart+1], so I think I would have to manually replicate the 1d array
@Michael -- As I said, it's been a while since I've done much with numpy. Can you np.sum(data[..., jstart:jend].ravel())?
I appreciate the help. The .ravel will only flatten the data and produce a scalar, but really the 1d array that normally results has distinct values and needs to be replicated. I think I just have to bite the bullet and put in an if statement to do tmp=(tmp[np.newaxis,...]).repeat(N,axis=0) if N>1
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.