Algorithm to find the number of specific continuous substrings

Question

I am trying to solve the following algorithmic problem:

Every day, Bob goes to work and does one of 26 possible tasks. The tasks are coded with the letters of the English alphabet from a to z. Sometimes Bob performs tasks assigned by his boss, and sometimes he can choose which of the 26 possible tasks he will do himself. Such days are marked with the symbol !. The boss has defined Bob's work plan for the upcoming N days. Bob really dislikes doing the same task for several days in a row, and to show his boss how bored he is, he decided to count the number of ways to choose a continuous segment of at least two days during which he will perform the same task every day.

That is, Bob considers all possible segments from day L to day R, where L < R, and if he can choose his work such that the tasks on all days in this segment are the same, then he considers this segment boring.

Help Bob determine the number of boring segments in the work plan for the next N days while he is bored.

Input Format

A single line is entered, which is a sequence of characters consisting of uppercase English letters and the symbol ! — the work plan for the next N (1 ≤ N ≤ 1000000) days.

Output Format

Output a single number — the number of boring segments in the plan.

Example 1

Input: a!b!c

Output: 5

Example 2

Input: a!

Output: 1

Notes

All possible segments in the first example are between | and |
|a!|b!c
a|!b|!c
a|!b!|c
a!|b!|c
a!b|!c|

I formalized the task as follows:

Problem

Given a work plan for N days as a string consisting of uppercase English letters ('a' through 'z') and exclamation marks ('!'), count the number of continuous segments of at least two days (i.e., from day L to day R, where 1≤L<R≤N) such that it is possible for Bob to perform the same task on each day within the segment. A segment is considered "boring" if either:

All characters within the segment are the same non-'!' character.
The segment contains at least one exclamation mark ('!').

Input

A single string representing the work plan for N days.

The string will consist only of uppercase English letters ('a' through 'z') and exclamation marks ('!').

The length of the string N will be between 1 and 1,000,000, inclusive (1≤N≤10⁶).

Output

A single integer representing the total number of "boring" segments in the given work plan.

My Code

func solve() {
  guard let plan = readLine() else { return }
  let n = plan.count
  let planArray = Array(plan)
  var boringSegmentCount = 0
  for i in 0..<n {
    var nonWildcards = Set<Character>()
    for j in i..<n {
      if planArray[j] != "!" {
        nonWildcards.insert(planArray[j])
      }
      if j > i && nonWildcards.count <= 1 {
        boringSegmentCount += 1
      }
    }
  }
  print(boringSegmentCount)
  
}

The provided Swift code aims to solve the problem by iterating through all possible continuous segments of the input string plan.

It uses nested loops to iterate through all possible continuous segments of the string:

The outer loop (i) determines the starting index of the segment.
The inner loop (j) determines the ending index of the segment.

For each segment (from index i to j):

It creates a Set called nonWildcards to store unique characters that are not '!'.
It iterates through the characters within the current segment.
If a character is not '!', it adds it to the nonWildcards set.
It checks if the length of the segment is greater than 1 (j > i) and if the number of unique non-wildcard characters in the segment is at most 1 (nonWildcards.count <= 1).
If both conditions are true, it increments the boringSegmentCount.

The main issue with this algorithm is its quadratic time complexity, O(n²) so for large values of n (up to 1,000,000 as stated in the problem), this algorithm can be very slow and lead to a time limit exceeded error.

My question

Is it possible to propose an algorithm with lower asymptotics for this task? I suppose it's possible to solve this problem in linear time, but to my shame, I still can't come up with a suitable logic.

Where did you find this problem? Please share a link and cite the author. — Bergi
– Bergi, Commented Apr 15 at 8:52

maraca · Accepted Answer · 2025-04-15 15:04:36Z

9

You can solve this in O(n) with the classic "how many qualifying segments end here" approach:

function solve(plan) {
    let total = 0,
        exclamCount = 0,
        letterCount = 0,
        letter
    for (let i = 0; i < plan.length; i++) {
        if (plan[i] == '!') {
            if (letter)
                total += letterCount++
            else
                total += exclamCount
            exclamCount++
        } else if (plan[i] == letter) {
            exclamCount = 0
            total += letterCount++
        } else {
            letter = plan[i]
            letterCount = exclamCount + 1
            total += exclamCount
            exclamCount = 0
        }
    }
    return total 
}

console.log(solve('a!b!c'))

Explanation: If you are on a ! then the longest sequence ending here is the one only using ! and a single letter (possibly multiple times). If you are on a letter then the longest sequence ending here is the one only using ! and this letter. In both cases the total number of sequences ending here is the longest sequence - 1 (because minimal sequence length is 2). So by keeping track of maximal exclamation and single letter spans we can easily calculate the number of sequences ending at position i going from left to right.

edited Apr 15 at 15:04

answered Apr 15 at 13:36

maraca

9,5873 gold badges27 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

clearcut3000 Apr 15 at 18:13

Thank you very much, dear maraca. A really elegant solution, I will try it!

maraca Apr 15 at 18:34

@clearcut3000 you are welcome, I don't know that language, if (letter) could be if (letter != null) or something similar, the rest looks like it should work with minor syntax changes. Or if (letterCount > 0) would work too there.

maraca Apr 15 at 18:51

Btw. you can also solve for the number of unique spans that can be created, for this we just have to add exclamCount * 25 to the total (before incrementing it) when encountering a ! (the 26th is counted already by the letterCount, except initially, then it is exclamCount * 26 with no letterCount).

Collectives™ on Stack Overflow

Algorithm to find the number of specific continuous substrings

Input Format

Output Format

Example 1

Example 2

Notes

Problem

Input

Output

My Code

My question

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Input Format

Output Format

Example 1

Example 2

Notes

Problem

Input

Output

My Code

My question

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related