I will change number in column date with value in monthList array.
monthList array
monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
pyspark code
d = df.select(col('InvoiceDate'),col('TotalSales')/1000000).groupBy(month('InvoiceDate')).sum()
d = d.select(col('month(InvoiceDate)').alias('date'),col('sum((TotalSales / 1000000))').alias('value')).orderBy('date')
d = d.select(col('date'),round(col('value'),2).alias('value'))
d.show()
result
+----+-----+
|date|value|
+----+-----+
| 1|19.75|
| 2|15.51|
| 3|20.66|
+----+-----+
I will try this but it not working. It's error 'DataFrame' object has no attribute 'apply'
d.date = d.select('date').apply(lambda x: monthList[x])
Thank you for your helping.