0

I have an array like this

sample = np.array([[9.99995470e-01],
                   [9.99992013e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99775827e-01],
                   [9.99439061e-01],
                   [9.98361528e-01],
                   [9.96853650e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99999762e-01]])

I want to get the max index where the values = 1 and it occurs consecutively more than 5 times. So the output should be index no 15.

I wonder if there is a simple function to solve this

1
  • 1
    I guess you need to find [1, 1, 1, 1, 1] instead of just == 1 check this answer Commented May 12, 2020 at 11:33

6 Answers 6

2

Using groupby

Code

import numpy as np
from itertools import groupby

def find_max_index(arr):

  # consecutive runs of ones
  # Use enumerate so we have the index with each value
  run_ones = [list(v) for k, v in groupby(enumerate(sample.flatten()), lambda x: x[1]) if k == 1]

  # Sorting by length to insure that max is at end of the list of lists
  # Since this is a stable last item will still be the largest index
  run_ones.sort(key=len) 

  last_list = run_ones[-1]
  if len(last_list) > 5:        # need max to have at least a run of five
    return last_list[-1][0]     # index of last value in max run of ones
  else:
    return None

print(find_max_index(sample))

# Output: 15

Explanation

function find_max_index

  1. groupby keeps groups runs of ones in sublist. Each item is index, value pair (from enumerate)

    run_ones = [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]

  2. Sort list to insure max is at end

    run_ones: [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]

  3. Last list containing run of ones

    last_list: [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]

  4. Index of the last tuple in last_list

    last_list[-1][0]

Sign up to request clarification or add additional context in comments.

1 Comment

It should be not just the biggest index of one but the highest index of the longest window of ones.
1

This should give you the index of the last appearance of 1 in a group of 5.

Input:

max([index for index, window in enumerate(windowed(sample, 5)) if list(window) == [1]*5]) + 4

Output:

15

1 Comment

What is windowed? Do you import it from somewhere?
1

Here is a function that will solve your problems for you

def find_repeated_index(sample, min_value, min_repeats):
  max_index = -1
  history   = []
  for index, value in enumerate(np.array(sample).flatten()):
    if value >= min_value: 
        history.append(value)
        if len(history) >= min_repeats: max_index = index
    else:
        if len(history) >= min_repeats: break                  
        history = []
  return max_index

find_repeated_index(sample, 1.0, 5)
15

find_repeated_index(sample, 1.0, 4)
5

Comments

1

Here's how you can solve this with O(n) runtime complexity and without allocating extra memory (not counting flattening and to list transformation).

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0] == window_value else -1

    max_length = 0
    length = 0

    for i, value in enumerate(array):
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample.flatten().tolist(), 1.0))

UPDATE: If you want to avoid flattening and conversion to list:

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0][0] == window_value else -1

    max_length = 0
    length = 0

    for i, item in enumerate(array):
        value = item[0]
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample, 1.0))

2 Comments

this one works too, but how do I determine which one is the most efficient way?
In this case, just pass the numpy array to the function and adjust the function to take the first element of the array item (because it's array of one-item arrays) but in this case the function won't be generic and will depend on numpy array layout.
0

Based on this snippet:

def find_runs(x):
    """Find runs of consecutive items in an array."""

    # ensure array
    x = np.asanyarray(x)
    if x.ndim != 1:
        raise ValueError('only 1D array supported')
    n = x.shape[0]

    # handle empty array
    if n == 0:
        return np.array([]), np.array([]), np.array([])

    else:
        # find run starts
        loc_run_start = np.empty(n, dtype=bool)
        loc_run_start[0] = True
        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
        run_starts = np.nonzero(loc_run_start)[0]

        # find run values
        run_values = x[loc_run_start]

        # find run lengths
        run_lengths = np.diff(np.append(run_starts, n))

        return run_values, run_starts, run_lengths

# Part added by me

values,indices,lengths = find_runs(sample.flatten())
ones = np.where(values==1)
fiveormore = np.where(lengths[ones]>=5)
r = indices[ones][fiveormore]
last_indices = r + lengths[ones][fiveormore] - 1 

The last_indices variable will be an array of the last indices of each 5 or longer consecutive part of the array where the value is 1. Getting the last of these indices is just a last_indices[-1] call. If there are no such indices the array will be empty.

Comments

0

Quick profiling for large arrays shows that the following solution based on the code from Counting consecutive 1's in NumPy array will be significantly faster than the other ones presented here:

import numpy as np


def group_cumsum(a):
    """Taken from https://stackoverflow.com/a/42129610"""
    a_ext = np.concatenate(([0], a, [0]))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]


array = sample[:, 0]
value = 1
n = 5

mask = array == value
cumsums = group_cumsum(mask)
if not np.any(cumsums > n):
    print(f"No more than {n} consecutive {value}'s are found in the array.")
else:
    index = len(sample) - np.argmax(cumsums[::-1] > n) - 1
    print(index)  # returns 15 for your example

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.