I have a big Python list containing integers.

Below is the definition of my event.

Look at each element of the list and check if that element is greater than a specified number, say S. If before that element, there is a zero, then the Event is recorded. That zero may not be immediately before, however after occurrence of similar past event.

Let say I have this list for S = 6

List = [3, 5, 6, 0, 2, 5, 6, 8, 3, 0, 7]

In above list there are 2 such events occurring at indices 7 and 10.

I am seeking an expert advice if there can be some efficient method to achieve the same. I can run a for loop, but for a large list, that may not be efficient.

11 Replies 11

I don't think there is any way to avoid a for loop.

What exactly is the result you want? The number of events? Or some "list" of their indices? Something else?

Even with a fairly well-specified question, the answer is still somewhat "it depends". My intuition is to partition the list on 0, and then iterate over the sublist for n >= S. If the list is truly large, that approach is useful because you can do each sublist in parallel across lower memory workers. There are also potentially other optimizations you could do, eliminating sublists with max(sublist) < S for example.

Partitioning enables you to skip parts of the list, but it still might not be the best choice, because splitting has some overhead.

How is the list generated in the first place? Does its order otherwise matter? Do you know the value of S as the list is being built, or only once it has been fully built?

Showing your for-loop solution and code for generating realistic data would be useful.

Start at the beginning of the list. Iterate until you see a zero. Iterate until you see a number greater than S. Repeat those two steps until you reach the end.

@kelly, range of integers is any non-negative integer

@Chris, yes order does matter

You declare a Boolean searching_for_a_0 = True, then iterate over the list. Inside the loop, you have

if searching_for_a_0:
    if item == 0:
        searching_for_a_0 = False
else:
    if item > S:
        result.append(item)
        looking_for_a_0 = True

the solution of dividing the list into sublists involves iterating over the list to create the sublists, and then iterating over each of these, so it is not efficient.

that may not be efficient

Try it! You're pre-supposing that there's even a problem in the first place. A naive solution might be perfectly fine; you don't know until you try. For related tips, check out Which is faster? by Eric Lippert.

You didn't answer Kelly's question about the output format. On first glance, it sounds like you want a list, but on closer reading, events are only spoken about in the singular. If you only need to return a scalar, that's a way easier problem to solve and optimize.

We all know the naive solution: you need to loop over all the items in your list, like in this code

def just_loop_over(ls, k):
    found_zero = False
    events = []
    for i, x in enumerate(ls):
        if found_zero:
            if x > k:
                events.append(i)
                found_zero = False
        else:
            if x == 0:
                found_zero = True
    return events

So my idea is... loop over the list twice, but let Python builtins do their job - list.index() is far most efficient than anything we could ever write, and if we check for an item greater than K only after a zero, we can reasonably hope to skip a large number of items.

def use_builtins(ls, k):
    events = []
    zeros = []
    z = -1
    still_good = True
    while still_good:
        try:
            z = ls.index(0, z + 1)
            zeros.append(z)
        except:
            still_good = False
    zeros.append(len(ls))
    for z in range(len(zeros) - 1):
        for i, x in enumerate(ls[zeros[z] + 1:zeros[z + 1]], start = zeros[z] + 1):
                if x > k:
                    events.append(i)
                    break
    return events

I tried both functions on a random generated list of 10k values ranging from 0 to 999 (so I could expect about ten instances of zero), and the speed gain is about 2.5x, but I expect it should be greater than that on bigger lists.

Of course it depends on how many zeros are in the list and where they are, and where the first item following each zero and greater than K is. So for bigger Ks it will be a bit slower, for smaller Ks a bit faster.

As far as I know, if you want something more efficient, your only option is multiprocessing.

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.