1

I am trying to replace my own customized characters with ' '. Here is what I feel confused about:

If I just replace one character, it is OK:

a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace('/',' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

If I use a short string which includes some characters, it is also OK:

b2='[?|:|-|\'|\\|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b2,' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

But, when I try to use a long string to do this, nothing changes:

b1='[?|:|-|\'|\\|.|(|)|[|]|{|}|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b1,' ')
a

The result is:

   title title1
0    a/b    a/b
1  a # b  a # b
2    a+b    a+b

You can see that in the first two examples, / is replaced with ' '. But in the last one, the replacement does not happen, which I do not know why? Is this because there is a limit for the string? Or, there is a better way that I do not know?

Update

Thanks a lot @Oliver Hao. But what I what is to do this for one (or more) column in a data frame, then save the result back to the data frame as a new column. So when I try:

regex = r"[?:\-'\\\|.()\[\]{}/]"
a.loc[:,'title1']=re.sub(regex," ",a.loc[:,'title'],0,re.MULTILINE)

I have got the error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\fefechen\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
2
  • I have not used python. You can see if it is a python version, because the test code I gave is 2.x, and you are using 3.x. Commented Aug 7, 2019 at 5:50
  • Look at this . Commented Aug 7, 2019 at 5:54

3 Answers 3

1

This expression might also work,

b1="[|,.:;+–_#&@!$%()[\]{}?'\"\/\\-]"

with less escapings.

Sign up to request clarification or add additional context in comments.

Comments

0

Updated to:b1='[?:\-\'\\\|.()\[\]{}/]'

regex demo

Code:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[?:\-'\\\|.()\[\]{}/]"

test_str = "'a/b','a # b','a+b'"

subst = " "

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

1 Comment

Hi, Thanks a lot. But I need to save the result back to the data frame as a new column. So it it different from your answer. And I do not know how to revise it. Could you please take a look at my edited question above? Thanks
0

I found the answers myself. The last one does not work because I should do this:

b1="[?|:|\-|\–|\'|\\|.|\(|\)|\[|\]|\{|\}|/|#|+|,|;|_|\"|&|@|!|$|%|\|]"

put \ in front of some special characters.

1 Comment

The pipe between characters in a character class is useless.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.