substring of a string using python?

Question

I want to find the exact Substring of a string.

import string
a=['accept','freed*','partie*','accepta*','freeing','party*']
sent="i am accepting your invitation for the party"
token=sent.split(" ")
for j in range(0,len(a)):
    for i in range(0,len(token)):
        if(token[i].find(a[j])==0):            
               print "found",token[i],a[j],token[i].find(a[j])

Output:

> found accepting accept 0

Desired output:

> found accepting accept 0
> found part party* 0

I have tried a lot, using re.search(),index().., but I didn't get the desired output. If anybody know this, pleas help me out?

content of Posemo.csv : accept ,accepta*,accepted ,accepting ,accepts etc..

solution:

import operator,csv,re
from collections import defaultdict
def post_features(inpt_word_first_char):
        input_file="/home/user/Thesis/BOOKS/Features/Posemo.csv"
        match_words=[]
        fin=open(input_file,"r")
        read_list=fin.read()
        match_words=[word for word in read_list.split() if word.startswith(inpt_word_first_char)]
        return match_words



matches = defaultdict(list)    
input_line="I am accepting your invitation for the party"
input_line=input_line.lower()
input_words=input_line.split(" ")

for i in range(0,len(input_words)):
        inpt_word_first_char=input_words[i][0]
        match_words=post_features(inpt_word_first_char)
        match_words1=[]
        for k in range (0,len(match_words)):                
                match_words1.append(match_words[k].rstrip("*"))
        for match in match_words1:
                        if match in input_words[i] :
                                if((len(input_words[i])>=len(match) and len(match)>2) or len(match)==len(input_words[i])):
                                        match_perc=map(operator.eq,input_words[i],match).count(True)
                                        matches[input_words[i]].append([match,match_perc])


##print matches

for word,match_percentage in matches.iteritems():
            print('Key: {} - Matched word : {}'.format(word,max(match_percentage[match_percentage.index(max(match_percentage))])))

Burhan Khalid · Accepted Answer · 2014-07-11 10:40:59Z

1

Here is another approach which will filter only those keys that matched:

import re

needles = ['accept','freed','partie','accepta','freeing','party']
haystack = "I am accepting your invitation for the party."

words = re.findall(r'(\w+)', haystack)
results = [(word, key) for key in needles for word in words if key in word]

# Or, the long way

results = []
for key in needles:
    for word in words:
        if key in word:
            results.append((word, key))

for word,key in results:
    print('Found {} {}'.format(word, key))

If you want to know how many times a key matches, then you need a different approach:

import re
from collections import defaultdict

matches = defaultdict(list)
needles = ['accept','freed','partie','accepta','freeing','party']
haystack = "I am accepting your invitation for the party. No, really, I accept!"
words = re.findall(r'(\w+)', haystack)

for key in needles:
    for word in words:
       if key in word:
           matches[key].append(word)

for key, found in matches.iteritems():
    print('Key: {} - Total Matches: {}'.format(key, len(found)))
    for match in found:
        print('\t{}'.format(match))

Here is an example:

>>> needles
['accept', 'freed', 'partie', 'accepta', 'freeing', 'party', 'problem']
>>> haystack
'My party had two problems. One problem, and another problem. Too many people accepted the invitation to this party!'
>>> matches = defaultdict(list)
>>> words = re.findall(r'(\w+)', haystack)
>>> for key in needles:
...   for word in words:
...     if key in word:
...       matches[key].append(word)
... 
>>> for key, found in matches.iteritems():
...   print('Key: {} - Total Matches: {}'.format(key, len(found)))
...   for match in found:
...     print('\t{}'.format(match))
... 
Key: party - Total Matches: 2
    party
    party
Key: problem - Total Matches: 3
    problems
    problem
    problem
Key: accept - Total Matches: 1
    accepted

edited Jul 11, 2014 at 10:40

answered Jul 11, 2014 at 10:12

Burhan Khalid

175k20 gold badges254 silver badges291 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

chinnu jithin Over a year ago

needles = ['accept','freed','partie','accepta','freeing','party*','problem*'] haystack = "These are my problems"] in this case they wont be any match..but "problems" match with "problem" in needles_list

Burhan Khalid Over a year ago

You don't need *, just change needle to problem.

chinnu jithin Over a year ago

but if needles=['hat'] and my input is "That is my problem" so it will return match..that is incorrect ryt?

chinnu jithin Over a year ago

tried your code..thats why i asked the doubt..sorry for the disturbance

Namit Sinha · Accepted Answer · 2014-07-11 10:04:30Z

1

You can use a simple comparison

a="namit"
b="amit"
if b in a:
    print("found")

so you don't have to split your sent string just run a loop for a

for x in a:
    if x in sent:
        print("found",x)

edited Jul 11, 2014 at 10:04

answered Jul 11, 2014 at 9:56

Namit Sinha

1,4553 gold badges17 silver badges31 bronze badges

6 Comments

Burhan Khalid Over a year ago

No need for ( ) in the if.

Namit Sinha Over a year ago

habits hard to kill if you are hard core c++ fan :P

Namit Sinha Over a year ago

you didn't get a result(match) for party ? is that what you want to say

Namit Sinha Over a year ago

see you are searching for "party*" not "party" so you will not get a match . change your list to ['accept','freed*','partie','accepta*','freeing','party'] maybe ?

chinnu jithin Over a year ago

ya that is what am asking..is there any way to get match?

|

Collectives™ on Stack Overflow

substring of a string using python?

2 Answers 2

4 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related