0

Say I have one of the strings:

"a b c d e f f g" || "a b c f d e f g"

And I want there to be only one occurrence of a substring (f in this instance) throughout the string so that it is somewhat sanitized. The result of each string would be:

"a b c d e f g" || "a b c d e f g"

An example of the use would be:

str = "a b c d e f g g g g g h i j k l"
str.leaveOne("g") 
#// a b c d e f g h i j k l
5
  • 1
    Why are you passing "f" to str.leaveOne but it is removing gs? Commented May 16, 2019 at 4:40
  • So which occurance you want to omit? Commented May 16, 2019 at 4:41
  • 1
    @MarkMeyer appologies, I fixed the question. Commented May 16, 2019 at 4:51
  • So only the last of the duplicates are kept? Commented May 16, 2019 at 4:52
  • In a way @Chris it splits by the letter, then removes all the empty pieces for (n-1) of them and replaces the last of the n sequence with the character. Commented May 16, 2019 at 4:59

3 Answers 3

2

If it doesn't matter which instance you leave, you can use str.replace, which takes a parameter signifying the number of replacements you want to perform:

def leave_one_last(source, to_remove):
    return source.replace(to_remove, '', source.count(to_remove) - 1)

This will leave the last occurrence.

We can modify it to leave the first occurrence by reversing the string twice:

def leave_one_first(source, to_remove):
    return source[::-1].replace(to_remove, '', source.count(to_remove) - 1)[::-1]

However, that is ugly, not to mention inefficient. A more elegant way might be to take the substring that ends with the first occurrence of the character to find, replace occurrences of it in the rest, and finally concatenate them together:

def leave_one_first_v2(source, to_remove):
    first_index = source.index(to_remove) + 1
    return source[:first_index] + source[first_index:].replace(to_remove, '')

If we try this:

string = "a b c d e f g g g g g h i j k l g"

print(leave_one_last(string, 'g'))
print(leave_one_first(string, 'g'))
print(leave_one_first_v2(string, 'g'))

Output:

a b c d e f      h i j k l g
a b c d e f g     h i j k l 
a b c d e f g     h i j k l 

If you don't want to keep spaces, then you should use a version based on split:

def leave_one_split(source, to_remove):
    chars = source.split()
    first_index = chars.index(to_remove) + 1
    return ' '.join(chars[:first_index] + [char for char in chars[first_index:] if char != to_remove])

string = "a b c d e f g g g g g h i j k l g"

print(leave_one_split(string, 'g'))

Output:

'a b c d e f g h i j k l'
Sign up to request clarification or add additional context in comments.

Comments

1

If I understand correctly, you can just use a regex and re.sub to look for groups of two or more of your letter with or without a space and replace it by a single instance:

import re
def leaveOne(s, char):  
    return re.sub(r'((%s\s?)){2,}' % char, r'\1' , s)

leaveOne("a b c d e f g g g h i j k l", 'g') 
# 'a b c d e f g h i j k l'

leaveOne("a b c d e f ggg h i j k l", 'g')
# 'a b c d e f g h i j k l'

leaveOne("a b c d e f g h i j k l", 'g')
# 'a b c d e f g h i j k l'

EDIT

If the goal is to get rid of all occurrences of the letter except one, you can still use a regex with a lookahead to select all letters followed by the same:

import re
def leaveOne(s, char):  
    return re.sub(r'(%s)\s?(?=.*?\1)' % char, '' , s)

print(leaveOne("a b c d e f g g g h i j k l g", 'g'))
# 'a b c d e f h i j k l g'

print(leaveOne("a b c d e f ggg h i j k l gg g", 'g'))
# 'a b c d e f h i j k l g'

print(leaveOne("a b c d e f g h i j k l", 'g'))
# 'a b c d e f g h i j k l'

This should even work with more complicated patterns like:

leaveOne("a b c ffff d e ff g", 'ff')
# 'a b c d e ff g'

2 Comments

This won't work in case of leaveOne("a b c f d e f g", 'f') > 'a b c f d e f g', where duplicates are not adjacent
I think I read the question differently that you @Chris. That looks like the correct result to me. It gets rid of substrings of multiple fs. There aren't any in your example.
1

Given String

mystr = 'defghhabbbczasdvakfafj'

cache = {}

seq = 0
for i in mystr:
    if i not in cache:
        cache[i] = seq
        print (cache[i])
        seq+=1

mylist = []

Here I have ordered the dictionary with values

 for key,value in sorted(cache.items(),key=lambda x : x[1]):
        mylist.append(key)
 print ("".join(mylist))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.