Python .replace() function, removing backslash in certain way

Question

I have a huge string which contains emotions like "\u201d", AS WELL AS "\advance\"

all that I need is to remove back slashed so that:

- \u201d = \u201d
- \united\ = united

(as it breaks the process of uploading it to BigQuery database)

I know it should be somehow this way:

string.replace('\','') But not sure how to keep \u201d emotions.

ADDITIONAL: Example of Unicode emotions

\ud83d\udc9e
\u201c
\u2744\ufe0f\u2744\ufe0f\u2744\ufe0f

@DirtyBit both BUT I need to keep backslash next to emotions - "\u201d" — Bobbby
– Bobbby, Commented Apr 3, 2019 at 8:13
Do the emotions have something in common? Such as starting with the same letter ? (u) — BlueSheepToken
– BlueSheepToken, Commented Apr 3, 2019 at 8:15
@AriCooper-Davis Yes, exactly, you was right, this is the difficult part... — Bobbby
– Bobbby, Commented Apr 3, 2019 at 8:15
@BlueSheepToken yes they have started with "u" BUT it could be something like "\united\" which have to be removed — Bobbby
– Bobbby, Commented Apr 3, 2019 at 8:16

BlueSheepToken · Accepted Answer · 2019-04-03 11:13:16Z

1

You can split on all '\' and then use a regex to replace your emotions with adding leading '\'

s = '\\advance\\\\united\\ud83d\\udc9e\\u201c\\u2744\\ufe0f\\u2744\\ufe0f\\u2744\\ufe0f'
import re
print(re.sub('(u[a-f0-9]{4})',lambda m: '\\'+m.group(0),''.join(s.split('\\'))))

As your emotions are 'u' and 4 hexa numbers, 'u[a-f0-9]{4}' will match them all, and you just have to add leading backslashes

First of all, you delete every '\' in the string with either ''.join(s.split('\\')) or s.replace('\\')

And then we match every "emotion" with the regex u[a-f0-9]{4} (Which is u with 4 hex letters behind)

And with the regex sub, you replace every match with a leading \\

edited Apr 3, 2019 at 11:13

answered Apr 3, 2019 at 9:02

BlueSheepToken

6,1973 gold badges23 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

alec_djinn · Accepted Answer · 2019-04-03 10:51:13Z

1

You could simply add the backslash in front of your string after replacement if your string starts with \u and have at least one digit.

import re

def clean(s):

    re1='(\\\\)' # Any Single Character "\"
    re2='(u)'    # Any Single Character "u"
    re3='.*?'    # Non-greedy match on filler
    re4='(\\d)'  # Any Single Digit

    rg = re.compile(re1+re2+re3+re4,re.IGNORECASE|re.DOTALL)
    m = rg.search(s)

    if m:
        r = '\\'+s.replace('\\','')
    else:
        r = s.replace('\\','')
    return r


a = '\\u123'
b = '\\united\\'
c = '\\ud83d'

>>> print(a, b, c)
\u123 \united\ \ud83d

>>> print(clean(a), clean(b), clean(c))
\u123 united \ud83d

Of course, you have to split your sting if multiple entries are in the same line:

string = '\\u123 \\united\\ \\ud83d'
clean_string = ' '.join([clean(word) for word in string.split()])

edited Apr 3, 2019 at 10:51

answered Apr 3, 2019 at 8:38

alec_djinn

10.9k9 gold badges57 silver badges77 bronze badges

5 Comments

alec_djinn Over a year ago

That will fail, if this is the case you need to use regex to distinguish u followed by a number from the rest. But then please update your question to include this kind of input.

alec_djinn Over a year ago

@Bobbby I have updated my answer. It looks for words starting with \u and containing at least one digit.

Bobbby Over a year ago

Sorry for late replay, this looks amazing, but regarding this r = '\\'+s.replace('\\','') as you treat string a it contain only specific value, how about string = \u123 \united\ \ud83d

alec_djinn Over a year ago

That's very basic split and iterate. I am sure you can figure it out yourself. If not, in 1h I can post the solution.

alec_djinn Over a year ago

But the again, edit your question, specify input and desired output please.

DeshDeep Singh · Accepted Answer · 2019-04-03 08:43:44Z

0

You can use this simple method to replace the last occurence of your character backslash: Check the code and use this method.

def replace_character(s, old, new):
    return (s[::-1].replace(old[::-1],new[::-1], 1))[::-1]

replace_character('\advance\', '\','')
replace_character('\u201d', '\','')

Ooutput:

\advance \u201d

answered Apr 3, 2019 at 8:43

DeshDeep Singh

1,9032 gold badges27 silver badges47 bronze badges

Comments

MikkelDalby · Accepted Answer · 2019-04-03 09:06:10Z

0

You can do it as simple as this

text = text.replace(text[-1],'')

Here you just replace the last character with nothing

answered Apr 3, 2019 at 9:06

MikkelDalby

1523 silver badges14 bronze badges

Collectives™ on Stack Overflow

Python .replace() function, removing backslash in certain way

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related