1

In pyspark , how to replace the text ( "\"\"") with empty string .tried with regexp_replace(F.col('new'),'\\' ,''). but not working.

in .csv File contains

|"\\\"\\\""|

df.show is showing like this

\"\"

But i am expecting to print empty('') string

2 Answers 2

1

You should escape quotes and \ in regex.

Regex for text "\"\"" is \"\\\"\\\"\"

Below spark-scala code is working fine and same should work in pyspark also.

  val inDF = List(""""\"\""""").toDF()

  inDF.show()

   /*
   +------+
   | value|
   +------+
   |"\"\""|
   +------+
   */
  
  inDF.withColumn("value", regexp_replace('value, """\"\\\"\\\"\"""", "")).show()

   /*
   +-----+
   |value|
   +-----+
   |     |
   +-----+
    */
Sign up to request clarification or add additional context in comments.

Comments

0

The text and the pattern you're using don't match with each other.

The text you gave as an example would equal to an output of "" while the pattern would be equal to an output of \

Try running the following in the playground to see what I mean.

print("\"\"")
print('\\')

Not sure about the rest as I haven't used pyspark and your code snippet may not include enough information to determine if there are any other issues.

1 Comment

I tried with scala code , it working fine.. the same code , replicated in pyspark , it is not working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.