Finding maximum index of a specific value where it occurs consecutively more than N times

Question

I have an array like this

sample = np.array([[9.99995470e-01],
                   [9.99992013e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99775827e-01],
                   [9.99439061e-01],
                   [9.98361528e-01],
                   [9.96853650e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99999762e-01]])

I want to get the max index where the values = 1 and it occurs consecutively more than 5 times. So the output should be index no 15.

I wonder if there is a simple function to solve this

I guess you need to find [1, 1, 1, 1, 1] instead of just == 1 check this answer — Grijesh Chauhan
– Grijesh Chauhan, Commented May 12, 2020 at 11:33

DarrylG · Accepted Answer · 2020-05-12 13:49:30Z

2

Using groupby

Code

import numpy as np
from itertools import groupby

def find_max_index(arr):

  # consecutive runs of ones
  # Use enumerate so we have the index with each value
  run_ones = [list(v) for k, v in groupby(enumerate(sample.flatten()), lambda x: x[1]) if k == 1]

  # Sorting by length to insure that max is at end of the list of lists
  # Since this is a stable last item will still be the largest index
  run_ones.sort(key=len) 

  last_list = run_ones[-1]
  if len(last_list) > 5:        # need max to have at least a run of five
    return last_list[-1][0]     # index of last value in max run of ones
  else:
    return None

print(find_max_index(sample))

# Output: 15

Explanation

function find_max_index

groupby keeps groups runs of ones in sublist. Each item is index, value pair (from enumerate)

run_ones = [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]
Sort list to insure max is at end

run_ones: [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]
Last list containing run of ones

last_list: [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]
Index of the last tuple in last_list

last_list[-1][0]

edited May 12, 2020 at 13:49

answered May 12, 2020 at 11:38

DarrylG

17.2k2 gold badges20 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dmytro Over a year ago

It should be not just the biggest index of one but the highest index of the longest window of ones.

Noah Smith · Accepted Answer · 2020-05-12 11:38:46Z

1

This should give you the index of the last appearance of 1 in a group of 5.

Input:

max([index for index, window in enumerate(windowed(sample, 5)) if list(window) == [1]*5]) + 4

Output:

answered May 12, 2020 at 11:38

Noah Smith

1874 bronze badges

1 Comment

Georgy Over a year ago

What is windowed? Do you import it from somewhere?

James McGuigan · Accepted Answer · 2020-05-12 11:41:09Z

1

Here is a function that will solve your problems for you

def find_repeated_index(sample, min_value, min_repeats):
  max_index = -1
  history   = []
  for index, value in enumerate(np.array(sample).flatten()):
    if value >= min_value: 
        history.append(value)
        if len(history) >= min_repeats: max_index = index
    else:
        if len(history) >= min_repeats: break                  
        history = []
  return max_index

find_repeated_index(sample, 1.0, 5)
15

find_repeated_index(sample, 1.0, 4)
5

answered May 12, 2020 at 11:41

James McGuigan

8,1464 gold badges30 silver badges30 bronze badges

Comments

Dmytro · Accepted Answer · 2020-05-12 13:33:01Z

1

Here's how you can solve this with O(n) runtime complexity and without allocating extra memory (not counting flattening and to list transformation).

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0] == window_value else -1

    max_length = 0
    length = 0

    for i, value in enumerate(array):
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample.flatten().tolist(), 1.0))

UPDATE: If you want to avoid flattening and conversion to list:

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0][0] == window_value else -1

    max_length = 0
    length = 0

    for i, item in enumerate(array):
        value = item[0]
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample, 1.0))

edited May 12, 2020 at 13:33

answered May 12, 2020 at 12:03

Dmytro

9731 gold badge8 silver badges16 bronze badges

2 Comments

Ryru Lobo Over a year ago

this one works too, but how do I determine which one is the most efficient way?

Dmytro Over a year ago

In this case, just pass the numpy array to the function and adjust the function to take the first element of the array item (because it's array of one-item arrays) but in this case the function won't be generic and will depend on numpy array layout.

Gábor Fekete · Accepted Answer · 2020-05-12 12:17:04Z

Based on this snippet:

def find_runs(x):
    """Find runs of consecutive items in an array."""

    # ensure array
    x = np.asanyarray(x)
    if x.ndim != 1:
        raise ValueError('only 1D array supported')
    n = x.shape[0]

    # handle empty array
    if n == 0:
        return np.array([]), np.array([]), np.array([])

    else:
        # find run starts
        loc_run_start = np.empty(n, dtype=bool)
        loc_run_start[0] = True
        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
        run_starts = np.nonzero(loc_run_start)[0]

        # find run values
        run_values = x[loc_run_start]

        # find run lengths
        run_lengths = np.diff(np.append(run_starts, n))

        return run_values, run_starts, run_lengths

# Part added by me

values,indices,lengths = find_runs(sample.flatten())
ones = np.where(values==1)
fiveormore = np.where(lengths[ones]>=5)
r = indices[ones][fiveormore]
last_indices = r + lengths[ones][fiveormore] - 1

The last_indices variable will be an array of the last indices of each 5 or longer consecutive part of the array where the value is 1. Getting the last of these indices is just a last_indices[-1] call. If there are no such indices the array will be empty.

Georgy · Accepted Answer · 2020-05-12 16:07:42Z

Quick profiling for large arrays shows that the following solution based on the code from Counting consecutive 1's in NumPy array will be significantly faster than the other ones presented here:

import numpy as np


def group_cumsum(a):
    """Taken from https://stackoverflow.com/a/42129610"""
    a_ext = np.concatenate(([0], a, [0]))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]


array = sample[:, 0]
value = 1
n = 5

mask = array == value
cumsums = group_cumsum(mask)
if not np.any(cumsums > n):
    print(f"No more than {n} consecutive {value}'s are found in the array.")
else:
    index = len(sample) - np.argmax(cumsums[::-1] > n) - 1
    print(index)  # returns 15 for your example

Collectives™ on Stack Overflow

Finding maximum index of a specific value where it occurs consecutively more than N times

6 Answers 6

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related