Identifying an event from a big Python list efficiently

Question

I have a big Python list containing integers.

Below is the definition of my event.

Look at each element of the list and check if that element is greater than a specified number, say S. If before that element, there is a zero, then the Event is recorded. That zero may not be immediately before, however after occurrence of similar past event.

Let say I have this list for S = 6

List = [3, 5, 6, 0, 2, 5, 6, 8, 3, 0, 7]

In above list there are 2 such events occurring at indices 7 and 10.

I am seeking an expert advice if there can be some efficient method to achieve the same. I can run a for loop, but for a large list, that may not be efficient.

Answer 1 · 2025-11-01 13:23:57Z

John Gordon

• Nov 1 at 13:23

I don't think there is any way to avoid a for loop.

Answer 2 · 2025-11-01 14:29:11Z

Kelly Bundy

• Nov 1 at 14:29

What exactly is the result you want? The number of events? Or some "list" of their indices? Something else?

Answer 3 · 2025-11-01 14:29:39Z

Even with a fairly well-specified question, the answer is still somewhat "it depends". My intuition is to partition the list on 0, and then iterate over the sublist for n >= S. If the list is truly large, that approach is useful because you can do each sublist in parallel across lower memory workers. There are also potentially other optimizations you could do, eliminating sublists with max(sublist) < S for example.

Partitioning enables you to skip parts of the list, but it still might not be the best choice, because splitting has some overhead.

Answer 4 · 2025-11-01 14:50:24Z

Chris

• Nov 1 at 14:50

How is the list generated in the first place? Does its order otherwise matter? Do you know the value of S as the list is being built, or only once it has been fully built?

Answer 5 · 2025-11-01 15:03:20Z

Kelly Bundy

• Nov 1 at 15:03

Showing your for-loop solution and code for generating realistic data would be useful.

Answer 6 · 2025-11-01 15:04:48Z

John Gordon

• Nov 1 at 15:04

Start at the beginning of the list. Iterate until you see a zero. Iterate until you see a number greater than S. Repeat those two steps until you reach the end.

Answer 7 · 2025-11-01 15:40:32Z

Brian Smith

• Nov 1 at 15:40

@kelly, range of integers is any non-negative integer

@Chris, yes order does matter

Answer 8 · 2025-11-01 16:20:34Z

You declare a Boolean searching_for_a_0 = True, then iterate over the list. Inside the loop, you have

if searching_for_a_0:
    if item == 0:
        searching_for_a_0 = False
else:
    if item > S:
        result.append(item)
        looking_for_a_0 = True

the solution of dividing the list into sublists involves iterating over the list to create the sublists, and then iterating over each of these, so it is not efficient.

Answer 9 · 2025-11-01 16:58:25Z

wjandrea

• Nov 1 at 16:58

that may not be efficient

Try it! You're pre-supposing that there's even a problem in the first place. A naive solution might be perfectly fine; you don't know until you try. For related tips, check out Which is faster? by Eric Lippert.

Answer 10 · 2025-11-01 17:10:18Z

wjandrea

• Nov 1 at 17:10

You didn't answer Kelly's question about the output format. On first glance, it sounds like you want a list, but on closer reading, events are only spoken about in the singular. If you only need to return a scalar, that's a way easier problem to solve and optimize.

Answer 11 · 2025-11-04 16:46:09Z

We all know the naive solution: you need to loop over all the items in your list, like in this code

def just_loop_over(ls, k):
    found_zero = False
    events = []
    for i, x in enumerate(ls):
        if found_zero:
            if x > k:
                events.append(i)
                found_zero = False
        else:
            if x == 0:
                found_zero = True
    return events

So my idea is... loop over the list twice, but let Python builtins do their job - list.index() is far most efficient than anything we could ever write, and if we check for an item greater than K only after a zero, we can reasonably hope to skip a large number of items.

def use_builtins(ls, k):
    events = []
    zeros = []
    z = -1
    still_good = True
    while still_good:
        try:
            z = ls.index(0, z + 1)
            zeros.append(z)
        except:
            still_good = False
    zeros.append(len(ls))
    for z in range(len(zeros) - 1):
        for i, x in enumerate(ls[zeros[z] + 1:zeros[z + 1]], start = zeros[z] + 1):
                if x > k:
                    events.append(i)
                    break
    return events

I tried both functions on a random generated list of 10k values ranging from 0 to 999 (so I could expect about ten instances of zero), and the speed gain is about 2.5x, but I expect it should be greater than that on bigger lists.

Of course it depends on how many zeros are in the list and where they are, and where the first item following each zero and greater than K is. So for bigger Ks it will be a bit slower, for smaller Ks a bit faster.

As far as I know, if you want something more efficient, your only option is multiprocessing.

Collectives™ on Stack Overflow

Identifying an event from a big Python list efficiently

11 Replies 11

Your Reply

Collectives™ on Stack Overflow

Identifying an event from a big Python list efficiently

11 Replies 11

Your Reply

Sign up or log in

Post as a guest