How to leave only one defined sub-string in a string in Python

Question

Say I have one of the strings:

"a b c d e f f g" || "a b c f d e f g"

And I want there to be only one occurrence of a substring (f in this instance) throughout the string so that it is somewhat sanitized. The result of each string would be:

"a b c d e f g" || "a b c d e f g"

An example of the use would be:

str = "a b c d e f g g g g g h i j k l"
str.leaveOne("g") 
#// a b c d e f g h i j k l

Why are you passing "f" to str.leaveOne but it is removing gs? — Mark
– Mark, Commented May 16, 2019 at 4:40
In a way @Chris it splits by the letter, then removes all the empty pieces for (n-1) of them and replaces the last of the n sequence with the character. — Jack Hales
– Jack Hales, Commented May 16, 2019 at 4:59

gmds · Accepted Answer · 2019-05-16 05:08:32Z

If it doesn't matter which instance you leave, you can use str.replace, which takes a parameter signifying the number of replacements you want to perform:

def leave_one_last(source, to_remove):
    return source.replace(to_remove, '', source.count(to_remove) - 1)

This will leave the last occurrence.

We can modify it to leave the first occurrence by reversing the string twice:

def leave_one_first(source, to_remove):
    return source[::-1].replace(to_remove, '', source.count(to_remove) - 1)[::-1]

However, that is ugly, not to mention inefficient. A more elegant way might be to take the substring that ends with the first occurrence of the character to find, replace occurrences of it in the rest, and finally concatenate them together:

def leave_one_first_v2(source, to_remove):
    first_index = source.index(to_remove) + 1
    return source[:first_index] + source[first_index:].replace(to_remove, '')

If we try this:

string = "a b c d e f g g g g g h i j k l g"

print(leave_one_last(string, 'g'))
print(leave_one_first(string, 'g'))
print(leave_one_first_v2(string, 'g'))

Output:

a b c d e f      h i j k l g
a b c d e f g     h i j k l 
a b c d e f g     h i j k l

If you don't want to keep spaces, then you should use a version based on split:

def leave_one_split(source, to_remove):
    chars = source.split()
    first_index = chars.index(to_remove) + 1
    return ' '.join(chars[:first_index] + [char for char in chars[first_index:] if char != to_remove])

string = "a b c d e f g g g g g h i j k l g"

print(leave_one_split(string, 'g'))

Output:

'a b c d e f g h i j k l'

Mark · Accepted Answer · 2019-05-16 05:18:44Z

1

If I understand correctly, you can just use a regex and re.sub to look for groups of two or more of your letter with or without a space and replace it by a single instance:

import re
def leaveOne(s, char):  
    return re.sub(r'((%s\s?)){2,}' % char, r'\1' , s)

leaveOne("a b c d e f g g g h i j k l", 'g') 
# 'a b c d e f g h i j k l'

leaveOne("a b c d e f ggg h i j k l", 'g')
# 'a b c d e f g h i j k l'

leaveOne("a b c d e f g h i j k l", 'g')
# 'a b c d e f g h i j k l'

EDIT

If the goal is to get rid of all occurrences of the letter except one, you can still use a regex with a lookahead to select all letters followed by the same:

import re
def leaveOne(s, char):  
    return re.sub(r'(%s)\s?(?=.*?\1)' % char, '' , s)

print(leaveOne("a b c d e f g g g h i j k l g", 'g'))
# 'a b c d e f h i j k l g'

print(leaveOne("a b c d e f ggg h i j k l gg g", 'g'))
# 'a b c d e f h i j k l g'

print(leaveOne("a b c d e f g h i j k l", 'g'))
# 'a b c d e f g h i j k l'

This should even work with more complicated patterns like:

leaveOne("a b c ffff d e ff g", 'ff')
# 'a b c d e ff g'

edited May 16, 2019 at 5:18

answered May 16, 2019 at 5:00

Mark

92.6k8 gold badges116 silver badges156 bronze badges

2 Comments

Chris Over a year ago

This won't work in case of leaveOne("a b c f d e f g", 'f') > 'a b c f d e f g', where duplicates are not adjacent

Mark Over a year ago

I think I read the question differently that you @Chris. That looks like the correct result to me. It gets rid of substrings of multiple fs. There aren't any in your example.

JustABeginner · Accepted Answer · 2019-05-16 13:09:26Z

1

Given String

mystr = 'defghhabbbczasdvakfafj'

cache = {}

seq = 0
for i in mystr:
    if i not in cache:
        cache[i] = seq
        print (cache[i])
        seq+=1

mylist = []

Here I have ordered the dictionary with values

 for key,value in sorted(cache.items(),key=lambda x : x[1]):
        mylist.append(key)
 print ("".join(mylist))

edited May 16, 2019 at 13:09

answered May 16, 2019 at 5:43

JustABeginner

8352 gold badges12 silver badges29 bronze badges

Collectives™ on Stack Overflow

How to leave only one defined sub-string in a string in Python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related