2

I'd like to replace a value present in a column with by creating search string from another column

before id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
After id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
I tried

df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")

both seam to fail with

TypeError: 'Column' object is not callable

could be similar to Pyspark replace strings in Spark dataframe column

2
  • 1
    Regex: (?<=PA)[^\.]+, substitution: 9999 Commented Feb 20, 2018 at 1:00
  • thank you very much @S.Jovan , it worked as expected :) Commented Feb 20, 2018 at 1:40

1 Answer 1

2

You might also use the spark udf.

The solution might be applied whenever you need to modify a data frame entry with a value from another column:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
             'st':['1234','125','156']})

spark_df = sparkSession.createDataFrame(pd_input)


replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())

spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()

Output:

+-----------+----+------------+
|     adress|  st|  adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.