3

Let's say I have two strings that contain similar (but not identical) substrings:

A = """Here is a test of a sentence with a few words in it. 
The rest of this sentence is different, though."""

B = """And here is a test of a sent;ence with a few wordz in it, 
as well. The quick brown fox jumped over the lazy dogs."""

How can I get the similar text between them, i.e. "Here is a test of a sentence with a few words in it" in A and "here is a test of a sent;ence with a few wordz in it" in B?

Edit: as far as I know, this isn't the same thing as calculating an edit distance. Sure, I can calculate an edit distance between "sentence" and "sent;ence", but that doesn't help me to identify the matching substrings.

19
  • 2
    What ideas do you have on this, and what have you tried? Commented Oct 22, 2016 at 16:23
  • 2
    Have you checked out difflib to see if it meets your use-case? Commented Oct 22, 2016 at 16:24
  • 4
    Start from here and end up here. Commented Oct 22, 2016 at 16:25
  • There is no easy way to achieve this. You need to write your algorithm. Split the string into list of words. Check for common sub-strings differing only by one intermediate words (or whatever your condition is). Commented Oct 22, 2016 at 16:26
  • Is it supposed to be sent;ence (in B and solution) and wordz (in B)? Or just a typo? Commented Oct 22, 2016 at 16:30

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.