3

I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.

I've found some examples that do this by specifying the column(s) to search, like this:

df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'

But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace will not work since I'm only searching for a substring. Thanks for your help!

1 Answer 1

10

You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):

str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question   Answer
#0      1       A
#1      2       X
#2      3       C

or if you want to replace all string columns:

str_cols = df.select_dtypes(['object']).columns
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.