converting a string to another only by insert in python

Question

I have some strings like : 'I go to by everyday' and 'I go to school by bus everyday' and 'you go to home by bus everyday' in python. I want to know that is it possible to convert the first one to the other ones only by inserting some characters ? if yes get the characters and where they must to insert! I used difflib.SequenceMatcher but in some string that have duplicated words it didn't work!

Please give an example where difflib.SequenceMatcher does not work for you. — Janne Karila
– Janne Karila, Commented Aug 14, 2013 at 6:49
Should the function yield true if it gets I go to by everyday and 'you go to home by bus everyday? After all, you cannot get the latter from the former just by inserting characters (the I at the beginning avoids this). — Frerich Raabe
– Frerich Raabe, Commented Aug 14, 2013 at 8:39
related: Is string X a subsequence of string Y? regex module supports fuzzy matching: regex.match('(?:%s){i}' % ''.join(map(regex.escape, first_one)), other). Note: It doesn't answer where and what characters to insert. — jfs
– jfs, Commented Aug 14, 2013 at 10:28

Joohwan · Accepted Answer · 2013-08-14 07:58:20Z

Let's restate the problem and say we are checking to see if s1 (e.g. "I go to by everyday") can become s2 (e.g. "I go to school by bus everyday") with just inserts. This problem is very simple if we were to look at the strings as ordered sets. Essentially we are asking if s1 is a subset of s2.

To solve this problem a greedy algorithm would suffice (and be the fastest). We iterate through each character in s1 and try to find the first occurrence of that character in s2. Meanwhile, we keep a buffer to hold all the mismatched characters that we run into while looking for the character, and the position where we started filling in the buffer in the first place. When we do find the character we are looking for, we dump the position and content of the buffer into a place holder.

When we hit the end of s1 before s2, that would effectively mean s1 is a subset of s2 and we return the placeholder. Otherwise s1 is not a subset of s2 and it is impossible to form s2 from s1 with just inserts, so we return false. This greedy algorithm would take O(len(s1) + len(s2)) and here is the code for it:



# we are checking if we can make s2 from s1 just with inserts
def check(s1, s2):

    # indices for iterating through s1 and s2
    i1 = 0
    i2 = 0

    # dictionary to keep track of where to insert what
    inserts = dict()
    buffer = ""
    pos = 0

    while i1 < len(s1) and i2 < len(s2):
        if s1[i1] == s2[i2]:
            i1 += 1
            i2 += 1
            if buffer != "":
                inserts[pos] = buffer
                buffer = ""
            pos += 1
        else:
            buffer += s2[i2]
            i2 += 1

    # if possible return the what and where to insert, otherwise return false
    if i1 == len(s1):
        return inserts
    else:
        return False

Hans Then · Accepted Answer · 2013-08-14 06:24:38Z

0

It is possible to convert the first one to the second, but not the third. E.g.:

words = sentence.split()
words.insert("school", 3)
words.insert("bus", 6)
sentence = ' '.join(words)

It is not possible to convert the second into any of the other strings by insertions only, since the "I" needs to be replaced by a "you".

answered Aug 14, 2013 at 6:24

Hans Then

11.4k3 gold badges36 silver badges52 bronze badges

Comments

freakish · Accepted Answer · 2013-08-14 06:50:26Z

Interesting problem. You can use regular expressions to check that. Something like that:

main = "I go to by everyday"
to_search = "I go to school by bus everyday"

import re
pattern = [chr for chr in main]
pattern = "(.*?)" + "(.*?)".join(pattern) + "(.*?)"
pattern = pattern.replace(' ', '\s{1}')
pattern = re.compile(pattern)

pattern.findall(to_search)

It might not be perfect though (won't work with characters which have to be escaped like \, but I'm sure you can tweak it).

Note, that you are saying only by inserts. So if take the string I go to by everyday and put any number of characters between characters in the string I will get what I want. Hence the idea to convert that string to the following regular expression:

(.*?)I(.*?)\s{1}(.*?)g(.*?)o(.*?)\s{1}(.*?)t(.*?)o(.*?)\s{1}(.*?)b(.*?)y(.*?)

Note that white space have been replaced by \s{1} operator and (.*?) basically means "catch all and put it in a group". For the first string you should get the following result:

[('', '', '', '', '', '', '', '', 'school ', '', '', 'bus ', '', '', '', '', '', '', '', '')]

which will tell you what strings you have to insert before each character in the original string (plus the last one at the end).

For the second string you will receive an empty list since the initial string is not convertible to it (missing "I" character).

Frerich Raabe · Accepted Answer · 2013-08-14 08:52:24Z

You could walk over both strings in parallel, maintaining an index into each string. For each equal character, you increment both indices - for each unequal character you just increase the index into the string to test against:

def f(s, t):
  """Yields True if 's' can be transformed into 't' just by inserting characters,
  otherwise false
  """

  lenS = len(s)
  lenT = len(t)

  # 's' cannot possible get turned into 't' by just insertions if 't' is shorter.
  if lenS > lenT:
    return False

  # If both strings are the same length, let's treat 's' to be convertible to 't' if
  # they are equal (i.e. you can transform 's' to 't' using zero insertions). You
  # may want to return 'False' here.
  if lenS == lenT:
    return s == t

  idxS = 0
  for ch in t:
    if idxS == lenS:
      return True
    if s[idxS] == ch:
      idxS += 1

  return idxS == lenS

This gets you

f('', 'I go to by everyday')                               # True
f('I go to by everyday', '')                               # False
f('I go to by everyday', 'I go to school by bus everyday') # True
f('I go to by everyday', 'you go to home by bus everyday') # False

Collectives™ on Stack Overflow

converting a string to another only by insert in python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related