Find similar words in strings in a for loop with python

Question

I'm working with tweets and after text processing , the code returns something like:

Lorem ipsum dolor sit amaet vi
Lorem ipsum dolor sit amaet
Lorem ipsum dolor sit amaet via

So sqlite database identify these records as unique. My question is how can I find if two strings contains 5 similar words then skip it? Should I change my regex code or add if statement?

My code:

        clean1 = re.sub(r"(?:@\S*|#\S*|http(?=.*://)\S*)", "", tweet.text)
        clean2 = re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t:])|(\w+:\/\/\S+)", " ", clean1)
        final = re.sub(r'^RT[\s]+', '', clean2)

Thanks!

does my answer solute your problem?

armnotstrong
– armnotstrong

2017-08-03 03:43:57 +00:00
Commented Aug 3, 2017 at 3:43 — armnotstrong
– armnotstrong, Commented Aug 3, 2017 at 3:43

armnotstrong · Accepted Answer · 2017-08-03 03:01:33Z

2

I don't think regex will help in this situation

You could do this to tell if two lines have 5 same words

str1 = "Lorem ipsum dolor sit amaet vi" 
str2 = "Lorem ipsum dolor sit amaet"

count = 0 
str1_split = str1.split(" ")
for word in str2.split(" "):
    if word in str1_split:
        count += 1

print count

answered Aug 3, 2017 at 3:01

armnotstrong

9,18518 gold badges70 silver badges137 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rohit-Pandey · Accepted Answer · 2017-08-03 08:10:30Z

0

Here is the method to count same words in two string:

a="Lorem ipsum dolor sit amaet vi"
b="Lorem ipsum dolor sit amaet"
count=0
for i,j in zip(a.split(),b.split()):
    if i==j:
        count+=1
print count

Output:

edited Aug 3, 2017 at 8:10

answered Aug 3, 2017 at 3:31

Rohit-Pandey

2,15919 silver badges24 bronze badges

Collectives™ on Stack Overflow

Find similar words in strings in a for loop with python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related