0

I am looking to do a Regex conditional search.

What I am looking to do is if there is Carriage Return (\r) followed by Upper and Lower Case alphabets the I want to add space ('') and remove carriage return but if after carriage there is anything else I just want to replace that. Is there a way I can do that using regex in Python

Sample Input:

BCP-\rEngin\reerin\rg\rSyste\rms\rSupp\rort

Output:

BCP- Engineering Systems Support

Data is in form of dataframe. I am currently using df.replace() function to replace "\r" with spaces (" ") but I would like it to be conditional.

Below is my code -

df_replace = df.replace(to_replace=r"\r", value = " ", regex=True)
4
  • Welcome to SO. What have you tried? Commented Aug 27, 2019 at 16:23
  • I have just basically tried replacing \r with nothing. But I am not sure how will I implement conditional replace. Commented Aug 27, 2019 at 16:30
  • What is an "uppercase character" for you? If the "ASCII" range A-Z is good enough for your case, that's easy, but if you want to handle any Unicode upper-case character, that's harder in standard Python regex. Commented Aug 27, 2019 at 16:34
  • Edit your question and show your attempt. It will make it easier for us to help you. Commented Aug 27, 2019 at 16:34

2 Answers 2

2

I am not familiar with python, but the regex you will need is as follows (perhaps someone with python experience can edit to customize this code):

This will find all \r that precede an uppercase letter, so replace this with an empty string:

\\r(?![A-Z])

This will find all \r that precede a lowercase letter, so replace this with a space:

\\r(?![a-z])

EDIT

Okay, here's one solution in Python I was able to put together for you:

import re

myString = "BCP-\rEngin\reerin\rg\rSyste\rms\rSupp\rort"

myString = re.sub("\\r(?![A-Z])", "", myString)
myString = myString.replace("\r", " ")  # This can be simple string replace
Sign up to request clarification or add additional context in comments.

1 Comment

will this work in dataframe - df_replace = df.replace(to_replace=r"\r", value = "@", regex=True) Can I replace your search condition here?
0

I was able to get the solution for this -

df_replace2 =  df.replace(to_replace = r"(\r)(?![A-Z])", value = "", regex=True)
df_replace3 = df_replace2.replace(to_replace = r"(\r)(?![a-z])", value = " ", regex=True)

Thanks @Brigadeiro for guiding with the solution

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.