I have some strings like : 'I go to by everyday' and 'I go to school by bus everyday' and 'you go to home by bus everyday' in python. I want to know that is it possible to convert the first one to the other ones only by inserting some characters ? if yes get the characters and where they must to insert! I used difflib.SequenceMatcher but in some string that have duplicated words it didn't work!
4 Answers
Let's restate the problem and say we are checking to see if s1 (e.g. "I go to by everyday") can become s2 (e.g. "I go to school by bus everyday") with just inserts. This problem is very simple if we were to look at the strings as ordered sets. Essentially we are asking if s1 is a subset of s2.
To solve this problem a greedy algorithm would suffice (and be the fastest). We iterate through each character in s1 and try to find the first occurrence of that character in s2. Meanwhile, we keep a buffer to hold all the mismatched characters that we run into while looking for the character, and the position where we started filling in the buffer in the first place. When we do find the character we are looking for, we dump the position and content of the buffer into a place holder.
When we hit the end of s1 before s2, that would effectively mean s1 is a subset of s2 and we return the placeholder. Otherwise s1 is not a subset of s2 and it is impossible to form s2 from s1 with just inserts, so we return false. This greedy algorithm would take O(len(s1) + len(s2)) and here is the code for it:
# we are checking if we can make s2 from s1 just with inserts
def check(s1, s2):
# indices for iterating through s1 and s2
i1 = 0
i2 = 0
# dictionary to keep track of where to insert what
inserts = dict()
buffer = ""
pos = 0
while i1 < len(s1) and i2 < len(s2):
if s1[i1] == s2[i2]:
i1 += 1
i2 += 1
if buffer != "":
inserts[pos] = buffer
buffer = ""
pos += 1
else:
buffer += s2[i2]
i2 += 1
# if possible return the what and where to insert, otherwise return false
if i1 == len(s1):
return inserts
else:
return False
Comments
It is possible to convert the first one to the second, but not the third. E.g.:
words = sentence.split()
words.insert("school", 3)
words.insert("bus", 6)
sentence = ' '.join(words)
It is not possible to convert the second into any of the other strings by insertions only, since the "I" needs to be replaced by a "you".
Comments
Interesting problem. You can use regular expressions to check that. Something like that:
main = "I go to by everyday"
to_search = "I go to school by bus everyday"
import re
pattern = [chr for chr in main]
pattern = "(.*?)" + "(.*?)".join(pattern) + "(.*?)"
pattern = pattern.replace(' ', '\s{1}')
pattern = re.compile(pattern)
pattern.findall(to_search)
It might not be perfect though (won't work with characters which have to be escaped like \, but I'm sure you can tweak it).
Note, that you are saying only by inserts. So if take the string I go to by everyday and put any number of characters between characters in the string I will get what I want. Hence the idea to convert that string to the following regular expression:
(.*?)I(.*?)\s{1}(.*?)g(.*?)o(.*?)\s{1}(.*?)t(.*?)o(.*?)\s{1}(.*?)b(.*?)y(.*?)
Note that white space have been replaced by \s{1} operator and (.*?) basically means "catch all and put it in a group". For the first string you should get the following result:
[('', '', '', '', '', '', '', '', 'school ', '', '', 'bus ', '', '', '', '', '', '', '', '')]
which will tell you what strings you have to insert before each character in the original string (plus the last one at the end).
For the second string you will receive an empty list since the initial string is not convertible to it (missing "I" character).
Comments
You could walk over both strings in parallel, maintaining an index into each string. For each equal character, you increment both indices - for each unequal character you just increase the index into the string to test against:
def f(s, t):
"""Yields True if 's' can be transformed into 't' just by inserting characters,
otherwise false
"""
lenS = len(s)
lenT = len(t)
# 's' cannot possible get turned into 't' by just insertions if 't' is shorter.
if lenS > lenT:
return False
# If both strings are the same length, let's treat 's' to be convertible to 't' if
# they are equal (i.e. you can transform 's' to 't' using zero insertions). You
# may want to return 'False' here.
if lenS == lenT:
return s == t
idxS = 0
for ch in t:
if idxS == lenS:
return True
if s[idxS] == ch:
idxS += 1
return idxS == lenS
This gets you
f('', 'I go to by everyday') # True
f('I go to by everyday', '') # False
f('I go to by everyday', 'I go to school by bus everyday') # True
f('I go to by everyday', 'you go to home by bus everyday') # False
difflib.SequenceMatcherdoes not work for you.I go to by everydayand'you go to home by bus everyday? After all, you cannot get the latter from the former just by inserting characters (theIat the beginning avoids this).regexmodule supports fuzzy matching:regex.match('(?:%s){i}' % ''.join(map(regex.escape, first_one)), other). Note: It doesn't answer where and what characters to insert.