0

I want to replace part of the string to blank if present in a list.

For example :

List

foo = ['.com', '.net', '.co', '.in']

Convert these strings to

google.com   
google.co.in 
google.net   
google.com/gmail/   

These strings

google  
google  
google  
google/gmail/

So far i have found this solution. Is there any other optimized way to do it?
replace item in a string if it matches an item in the list

7
  • why were '.co' and '.in' appended to the same string? Commented Jun 25, 2018 at 14:14
  • 1
    And ? What have you tried that didn't work ? Commented Jun 25, 2018 at 14:14
  • @Ev.Kounis These all are url's google.co.in Commented Jun 25, 2018 at 14:18
  • @brunodesthuilliers stackoverflow.com/questions/9396302/… I found this solution but i want to know is there any other effective way to do it? Commented Jun 25, 2018 at 14:21
  • You should have mentionned this other solution in your question. So, what's wrong with the solution in 9396302 ? Commented Jun 25, 2018 at 14:31

5 Answers 5

1

Similar to George Shulkin's answer.

import re
suffixes = ['.com', '.co', '.in', '.net']
patterns = [re.compile(suffix) for suffix in suffixes]

def remove_suffixes(s: str) -> str:
    for pattern in patterns:
        s = pattern.sub("", s)
    return s

# urls = ["google.com", ...
clean_urls = map(remove_suffixes, urls)
# or clean_urls = [remove_suffixes(url) for url in urls]

You might want to use the list comprehension, because it can be faster than map in many cases.

This has the advantage of also compiling the regexes, which can be better for performance when used in a loop.

Or if you decided to use functools.reduce,

from functools import reduce

def remove_suffixes(s: str) -> str:
    return reduce(lambda s, pattern: pattern.sub("", s), patterns, s) 
Sign up to request clarification or add additional context in comments.

Comments

1

You can use re.sub and str.join:

import re
foo = ['.com', '.net', '.co', '.in']
urls = ["google.com","google.co.in","google.net","google.com/gmail/"]
final_result = [re.sub('|'.join(foo), '', i) for i in urls]

Output:

['google', 'google', 'google', 'google/gmail/']

Comments

0

You need to split this task in two:

  1. Write a code to replace string with a new string if matched.
  2. Apply this function to the list.

First can be done with regexp (see below). Second can be done by using map function.

Example of the code to replace substring:

>>> import re
>>> re.sub(".com", "",  "google.com/gmail/")
'google/gmail/'

Example for use of the map function:

>>> map(lambda x: len(x), ["one", "two", "three"])
[3, 3, 5]

(it replaces elements of array with length of those elements).

You can combine those two to get what you want.

3 Comments

Why do you use the lambda in your example when map(len, ...) would do?
@EdwardMinnix I think he wanted to show an example using a lambda function, which might be more useful for the OP. Possibly not the best example here but anyway...
@Edward Minnix, because I can only phantom a proper partial application in Python. Without it it's hard to see how to use re.sub without lambda in map.
0

Using the suggestion of George Shuklin this is the simplest code i could come up with.


import re

domains = ['.com', '.net', '.co', '.in']

urls = ["google.com","google.co.in","google.net","google.com/gmail/"]

for i in range(len(urls)):
    for domain in domains:
        urls[i] = re.sub(domain,"",urls[i])

print(urls)

This outputs:

['google', 'google', 'google', 'google/gmail/']

2 Comments

Is it the best way possible. I don't want to go for multiple for loops. Is there any pre-built Python function that could do it ?
@SahilSingla The functions you're looking for are probably map and reduce, but those can have overhead. I would suggest making a function that loops through replacements, and just use either map or a list comprehension.
0

Another alternative is to use str.replace() and str.find().

foo = ['.com', '.net', '.co', '.in']
domains = ["google.com", "google.co.in", "google.net", "google.com/gmail/"]

def remove_extensions(domain, extensions):
    for ext in extensions:
        if domain.find(ext) != -1:
            domain = domain.replace(ext, "")
    return domain

list(map(lambda x: remove_extensions(x, foo), domains))

This code snippet outputs the result as expected:

['google', 'google', 'google', 'google/gmail/']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.