I have a pyspark dataframe with a Name column with sample values as follows:
id NAME
---+-------
1 aaa bb c
2 xx yy z
3 abc def
4 qw er
5 jon lee ls G
I have to flip the right most part and populate it on the left side with comma and delete the right most substring(split using the space)
Expected output
id NAME
---+-------
1 c, aaa bb
2 z, xx yy
3 def, abc
4 er, qw
5 G, jon lee ls
I was able to get the right most part to add it with comma by using below code: split_col=split(df['NAME'],' ') df2 = df2.withColumn('NAME_RIGHT',split_col.getItem(F.size(split_col) - 1))
the above line gives
NAME_RIGHT
c
z
def
er
I want to replace the values in NAME_RIGHT i.e. the right most values from the NAME column, I tried using below code but it replaces nothing, how can this be achieved?
df3 = df2.withColumn('NEW_NAME', regexp_replace(F.col("NAME"), str(df2.NAME_RIGHT),""))