0

I want to find the exact Substring of a string.

import string
a=['accept','freed*','partie*','accepta*','freeing','party*']
sent="i am accepting your invitation for the party"
token=sent.split(" ")
for j in range(0,len(a)):
    for i in range(0,len(token)):
        if(token[i].find(a[j])==0):            
               print "found",token[i],a[j],token[i].find(a[j])

Output:

> found accepting accept 0

Desired output:

> found accepting accept 0
> found part party* 0

I have tried a lot, using re.search(),index().., but I didn't get the desired output. If anybody know this, pleas help me out?

  • content of Posemo.csv : accept ,accepta*,accepted ,accepting ,accepts etc..

solution:

import operator,csv,re
from collections import defaultdict
def post_features(inpt_word_first_char):
        input_file="/home/user/Thesis/BOOKS/Features/Posemo.csv"
        match_words=[]
        fin=open(input_file,"r")
        read_list=fin.read()
        match_words=[word for word in read_list.split() if word.startswith(inpt_word_first_char)]
        return match_words



matches = defaultdict(list)    
input_line="I am accepting your invitation for the party"
input_line=input_line.lower()
input_words=input_line.split(" ")

for i in range(0,len(input_words)):
        inpt_word_first_char=input_words[i][0]
        match_words=post_features(inpt_word_first_char)
        match_words1=[]
        for k in range (0,len(match_words)):                
                match_words1.append(match_words[k].rstrip("*"))
        for match in match_words1:
                        if match in input_words[i] :
                                if((len(input_words[i])>=len(match) and len(match)>2) or len(match)==len(input_words[i])):
                                        match_perc=map(operator.eq,input_words[i],match).count(True)
                                        matches[input_words[i]].append([match,match_perc])


##print matches

for word,match_percentage in matches.iteritems():
            print('Key: {} - Matched word : {}'.format(word,max(match_percentage[match_percentage.index(max(match_percentage))])))

2 Answers 2

1

Here is another approach which will filter only those keys that matched:

import re

needles = ['accept','freed','partie','accepta','freeing','party']
haystack = "I am accepting your invitation for the party."

words = re.findall(r'(\w+)', haystack)
results = [(word, key) for key in needles for word in words if key in word]

# Or, the long way

results = []
for key in needles:
    for word in words:
        if key in word:
            results.append((word, key))

for word,key in results:
    print('Found {} {}'.format(word, key))

If you want to know how many times a key matches, then you need a different approach:

import re
from collections import defaultdict

matches = defaultdict(list)
needles = ['accept','freed','partie','accepta','freeing','party']
haystack = "I am accepting your invitation for the party. No, really, I accept!"
words = re.findall(r'(\w+)', haystack)

for key in needles:
    for word in words:
       if key in word:
           matches[key].append(word)

for key, found in matches.iteritems():
    print('Key: {} - Total Matches: {}'.format(key, len(found)))
    for match in found:
        print('\t{}'.format(match))

Here is an example:

>>> needles
['accept', 'freed', 'partie', 'accepta', 'freeing', 'party', 'problem']
>>> haystack
'My party had two problems. One problem, and another problem. Too many people accepted the invitation to this party!'
>>> matches = defaultdict(list)
>>> words = re.findall(r'(\w+)', haystack)
>>> for key in needles:
...   for word in words:
...     if key in word:
...       matches[key].append(word)
... 
>>> for key, found in matches.iteritems():
...   print('Key: {} - Total Matches: {}'.format(key, len(found)))
...   for match in found:
...     print('\t{}'.format(match))
... 
Key: party - Total Matches: 2
    party
    party
Key: problem - Total Matches: 3
    problems
    problem
    problem
Key: accept - Total Matches: 1
    accepted
Sign up to request clarification or add additional context in comments.

4 Comments

needles = ['accept','freed','partie','accepta','freeing','party*','problem*'] haystack = "These are my problems"] in this case they wont be any match..but "problems" match with "problem" in needles_list
You don't need *, just change needle to problem.
but if needles=['hat'] and my input is "That is my problem" so it will return match..that is incorrect ryt?
tried your code..thats why i asked the doubt..sorry for the disturbance
1

You can use a simple comparison

a="namit"
b="amit"
if b in a:
    print("found")

so you don't have to split your sent string just run a loop for a

for x in a:
    if x in sent:
        print("found",x)

6 Comments

No need for ( ) in the if.
habits hard to kill if you are hard core c++ fan :P
you didn't get a result(match) for party ? is that what you want to say
see you are searching for "party*" not "party" so you will not get a match . change your list to ['accept','freed*','partie','accepta*','freeing','party'] maybe ?
ya that is what am asking..is there any way to get match?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.