How do I fill this one column category of Nulls with the distinct value in it?
+---++--------+----------+
| id||category| Date|
+---+---------+----------+
| A1| Null|2010-01-02|
| A1| Null|2010-01-03|
| A1| Nixon|2010-01-04|
| A1| Null|2010-01-05|
| A9| Null|2010-05-02|
| A9| Leonard|2010-05-03|
| A9| Null|2010-05-04|
| A9| Null|2010-05-05|
+---+---------+----------+
Desired Dataframe:
+---++--------+----------+
| id||category| Date|
+---+---------+----------+
| A1| Nixon|2010-01-02|
| A1| Nixon|2010-01-03|
| A1| Nixon|2010-01-04|
| A1| Nixon|2010-01-05|
| A9| Leonard|2010-05-02|
| A9| Leonard|2010-05-03|
| A9| Leonard|2010-05-04|
| A9| Leonard|2010-05-05|
+---+---------+----------+
I tried:
w = Window().partitionBy("ID").orderBy("Date")
df = df.withColumn("category", F.when(col("category").isNull(), col("category")\
.distinct().over(w))\
.otherwise(col("category")))
I also tried:
df = df.fillna({'category': col('category').distinct()})
I have also tried:
df = df.withColumn('category', when(df.category.isNull(), df.category.distinct()).otherwise(df.category))
df_new = df.withColumn('category', F.first('category',True).over(Window.partitionBy('id')))firstwith the 2nd argumentignorenulls=Trueshould pick the first non-NULL value from the same partition. if there is any non-Null values, it should not be all null-out. spark.apache.org/docs/latest/api/python/…