4

Suppose you have a Spark dataframe containing some null values, and you would like to replace the values of one column with the values from another if present. In Python/Pandas you can use the fillna() function to do this quite nicely:

df = spark.createDataFrame([('a', 'b', 'c'),(None,'e', 'f'),(None,None,'i')], ['c1','c2','c3'])
DF = df.toPandas()
DF['c1'].fillna(DF['c2']).fillna(DF['c3']) 

How can this be done using Pyspark?

1 Answer 1

11

You need to use the coalesce function :

cDf = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
cDF.show()
# +----+----+
# |   a|   b|
# +----+----+
# |null|null|
# |   1|null|
# |null|   2|
# +----+----+

cDf.select(coalesce(cDf["a"], cDf["b"])).show()
# +--------------+
# |coalesce(a, b)|
# +--------------+
# |          null|
# |             1|
# |             2|
# +--------------+

cDf.select('*', coalesce(cDf["a"], lit(0.0))).show()
# +----+----+----------------+
# |   a|   b|coalesce(a, 0.0)|
# +----+----+----------------+
# |null|null|             0.0|
# |   1|null|             1.0|
# |null|   2|             0.0|
# +----+----+----------------+

You can also apply coalesce on multiple columns :

cDf.select(coalesce(cDf["a"], cDf["b"], lit(0))).show()
# ...

This example is taken from the pyspark.sql API documentation.

Sign up to request clarification or add additional context in comments.

2 Comments

Excellent. Worth noting that multiple columns can be passed for filling values cDf.select(coalesce(cDf["a"], cDf["b"], lit(0))).show()
Just make sure the column values are "null" and not "empty" strings. I had this problem and I had to explicitly make "empty" values of one of columns as "null" using df.withColumn('myCol', when(col('myCol') == '', None).otherwise(col('myCol')))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.