0

I need to merge multiple columns of a data frame into one single column as below in pyspark. I am new to pyspark.

Dataframe input and output

These columns were actually key value pairs(10,20,30) at the source and I tried to explode but I am getting 'Column' object is not callable. The existing logic splits the column as mentioned above in the input data frame and hence wanted to proceed with the same.

1
  • Can you show the python code you tried? That makes it easier to see where the issue is. Commented Oct 25, 2023 at 14:40

1 Answer 1

0

It is difficult to know where the issue is with what you already tried without seeing your code, but you could use PySpark's coalesce:

from pyspark.sql.functions import coalesce, col

df.withColumn("NEW", coalesce(col("MARK1"), col("MARK2"), col("MARK3")))

And subsequently drop the MARK1, MARK2, and MARK3 columns and then rename NEW to MARK1.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.