1

I have following table:

Id col1 col2
1  a    1   
1  b    2   
1  c    3   
2  a    1   
2  e    3   
2  f    4  

Expected output is:

Id col3
1  a1b2c3
2  a1e3f4

The aggregation computation involves 2 columns, is this supported in SQL?

1
  • I think aggregation can only aggregate on one column, so I need to combine 2 columns for from a new column then aggregation on that new column Commented Jul 1, 2022 at 1:11

1 Answer 1

2

In Spark SQL you can do it like this:

SELECT Id, aggregate(list, '', (acc, x) -> concat(acc, x)) col3
FROM (SELECT Id, array_sort(collect_list(concat(col1, col2))) list
      FROM df
      GROUP BY Id )

or in one select:

SELECT Id, aggregate(array_sort(collect_list(concat(col1, col2))), '', (acc, x) -> concat(acc, x)) col3
FROM df
GROUP BY Id

Higher-order aggregate function is used in this example.

aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.