Credit to Frank for his answer outlining the simplification to coalesce consecutive characters into single/multi character groups, forming a bitstring.
Building on this, I think its beneficial to compare this problem to classic lattice problems like checking balanced parenthesis or shifting / reducing from values on stack. Each single character group needs to combine with a group either to its left or to its right. You can imagine this as the choice of either shifting the group onto the stack, or reducing the top of the stack by removing all groups of the same character (requiring 2+ characters total). There's a bit more to it than that; we have to lay out some rules for when we're allowed to do each operations, but it sets the general idea.
Personally, I've been visualizing this graphically in my head as a lattice in the upper half plane, where the y-axis is my stack size, and the x-axis is by character group index. If I push a group onto the stack, I'd go one unit right and one unit up. We're trying to find a path from (0, 0) to (G, 0), where G is the number of character groups.
To simplify details and to mirror my code, I need to clarify some terminology. From here on, when I mention "reducing", I mean implicitly shifting the current group and then immediately reducing the top two groups of the stack. Besides being more compact, this also eliminates some redundancy in our stack state and makes it significantly easier to track the state of groups. Secondly, when I say a group is "stack-matching", I mean that the character(s) of that group match the character(s) of the group at the top of the stack. Lastly, when I say a group is "reducible" or can be reduced, I mean that (a) the stack is not empty and (b) the current group is stack-matching.
With that, we can start to build our DP table. It will have 3 dimensions: the group index, a boolean for stack-matching, and the stack size. For the single character group, we have three rules:
- if reducible, try reducing current group
- if reducible, try discarding current group
- if not reducible, try shifting current group
The first and last are fairly self-explanatory. The second rule is a result of how I've defined "reduce" above. We want our stack to alternate character groups ("a", "b", "a", ...), but some solutions would require combining 3+ groups at once, so we emulate this by discarding the current stack-matching group, leaving the top of the stack alone to combine 1:1 with a matching group sometime later.
For the multi character group case, we also have three rules:
- if reducible, try reducing current group
- unconditionally, try discarding the current group
- if not reducible, try shifting current group
Again, the first and last rules are self-explanatory (they are the same as before). The second rule is again an artifact of how we've defined reduction- we require exactly two groups, but in the case of a multi character group, the group could itself be eliminated without the need to combine with any other groups prior (via shifting and/or reducing).
In each of these six cases, we need to be careful to track whether the next group will be stack-matching based on our operations, but luckily due to the way we defined our reduce operation, we guarantee the stack alternates characters, so we can trivially determine stack-matching in each case without auxiliary data or expensive computation.
Here is what it looks like implemented recursively with memoisation:
from itertools import groupby
from functools import cache
def solve(string):
bitvec = [len(list(group)) > 1 for key, group in groupby(string)]
@cache
def is_solvable(index, is_matching, stack_size):
if index == len(bitvec):
return stack_size == 0
is_multi = bitvec[index]
is_reducible = is_matching and stack_size > 0
# (shift and) reduce group
if is_reducible:
if is_solvable(index + 1, True, stack_size - 1):
return True
# discard group
if is_multi:
# group doesn't need another group to combine with
if is_solvable(index + 1, not is_matching, stack_size):
return True
else:
# emulate 3+ way deferred combine / reduce
if is_reducible and is_solvable(index + 1, False, stack_size):
return True
# shift group
if not is_reducible:
if is_solvable(index + 1, False, stack_size + 1):
return True
return False
return is_solvable(0, False, 0)
On the average case, the code runs about 10x faster than the recursive solution provided in Frank's answer. When I inspected a handful of such inputs, I saw that the numbers of times the inner function gets called is within a small factor of G (the number of character groups). This is because I've ordered the rules to try the options that reduce stack size greedily first.
On the worst case, the code runs at least couple orders of magnitude faster. For the string aabaabbabbabbababababababababababaabbaabbabaababbabb I measured 0.0003595 vs 0.0477324 seconds, averaged across 50 samples. This is explained by the fact that the time complexity of my DP solution is O(G^2), whereas the recursion in the other answer has exponential worst-case time complexity. Also note that the worst-case is always when the result is False, meaning that the entire search space must be explored.
Statistically, strings that yield a False are extremely rare as the string length grows longer and longer. This means that in practice, the worst-case becomes less and less likely, and the average case (for both) tends to be roughly linear in G. I had significant difficulty randomly generating long false strings without just outright "constructing" them (obviously with some added bias). Again though, this may not matter for real-world data.
I'll keep thinking about the problem and see if I can come up with any insights that could lead to the O(n) desired solution.
as first, then it cannot be emptied.