How to replace string by other strings if the source string contain special characters [duplicate]

Question

In order to clean some string, I have to remove some substring that contains some special UTF-8 characters.

example:

source = "Skoda"
to_be_clean = "Škoda Rapid"

I need to replace from to_be_clean the string source by nothing. Obviously, the to_be_clean string contains some special character. Is there a way to do this task simply. Here is how I am doing it today.

output = to_be_clean.replace(source + ' ', '')

I was thinking about a regular expression but I need to list all the possible characters.

It's really not clear what you want. Are you hoping to find a way to make "Škoda" equal to "Skoda" so that you can then remove it? There are many questions about removing accents from Unicode; have you googled those? — tripleee
– tripleee, Commented Feb 21, 2018 at 15:01

Rakesh · Accepted Answer · 2018-02-21 15:10:55Z

2

unicodedata module should solve your problem.

# -*- coding: utf-8 -*-

import unicodedata
to_be_clean = u"Škoda Rapid"

print unicodedata.normalize('NFKD', to_be_clean).encode('ASCII', 'ignore')

Output:

Skoda Rapid

answered Feb 21, 2018 at 15:10

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael Over a year ago

Thanks, exactly what I was looking for. I was actually not aware of unicodedata module. Thanks

Collectives™ on Stack Overflow

How to replace string by other strings if the source string contain special characters [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related