1

i have a dataframe df=

name  state
null   CA
Julia  null
Robert null
null   NJ

where both name and state columns are string columns

I want to replace string 'null' from name column into None.

When I tried using the replace function as below, it is converting all values from name column to None. It is not the result I expect, I want only 'null' values to be converted to None.

df = df.withColumn('name', regexp_replace('name', 'null', None))

I am getting output as below, I guess it is not able to recognize 'null'.

name  state
null   CA
null   null
null   null
null   NJ

How to solve this issue?

1
  • how to do that Ronak Commented Jan 5, 2023 at 5:33

1 Answer 1

1

This would work

df.withColumn("name", F.when((F.col('name') == "null"), None)).show()

Output with filter on String "null" (as None's string representation is also null, can be misleading by just the output)

df.filter(F.col('name').eqNullSafe("null")).show()
df.withColumn("name", F.when((F.col('name') == "null"), None)).filter(F.col('name').eqNullSafe("null")).show()

Output

Sign up to request clarification or add additional context in comments.

3 Comments

i tried this but it aint convertinf string null to None Ronak
@karthik with show() the representation will still be the same but the underlying value will change i.e. instead of string "null" it'll be None
@karthik Added two outputs for clarity on this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.