I have a dataframe column with array of string as below. (Key,value) pair
ColA
[(1,2),(1,3),(1,4),(2,3)]
I have to remove duplicate keys by min value and get the results. Dont want to explode and do it. Key should be unique and the key is picked based on the min value. In the above column, there are three pairs with key as 1. So should pick (1,2) since value 2 is min among (1,2),(1,3),(1,4)
Output should be: ColA [(1,2),(2,3)]
I created a udf like
Val removeDup = udf((arr: Seq[String]) => {
Arr.map(x=>x.split(","))}))
Cannot use reduceby key as its a dataframe/dataset.