Pyspark replace strings in Spark dataframe column by using values in another column

Question

I'd like to replace a value present in a column with by creating search string from another column

before id address st 1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156 After id address st 1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156 I tried

df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")

both seam to fail with

TypeError: 'Column' object is not callable

could be similar to Pyspark replace strings in Spark dataframe column

Regex: (?<=PA)[^\.]+, substitution: 9999

Srdjan M.
– Srdjan M.

2018-02-20 01:00:54 +00:00
Commented Feb 20, 2018 at 1:00 — Srdjan M.
– Srdjan M., Commented Feb 20, 2018 at 1:00
thank you very much @S.Jovan , it worked as expected :)

prudhvi Indana
– prudhvi Indana

2018-02-20 01:40:35 +00:00
Commented Feb 20, 2018 at 1:40 — prudhvi Indana
– prudhvi Indana, Commented Feb 20, 2018 at 1:40

Grzegorz · Accepted Answer · 2019-03-14 13:50:22Z

2

You might also use the spark udf.

The solution might be applied whenever you need to modify a data frame entry with a value from another column:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
             'st':['1234','125','156']})

spark_df = sparkSession.createDataFrame(pd_input)


replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())

spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()

Output:

+-----------+----+------------+
|     adress|  st|  adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+

answered Mar 14, 2019 at 13:50

Grzegorz

1,39313 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark replace strings in Spark dataframe column by using values in another column

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related