aggregate multiple columns in sql table as json or array

I would like to aggregate some columns into an array or json object on redash. The data table is on presto database. I need to query it from pyspark hive. The data table is large and I need to keep its size as small as possible so that I can save the dataframe to s3 faster and then read it as parquet from s3 efficiently.

I am not sure what the best data structure should be for this? (json object? array of array ?)

The original table (> 10^9 rows, some columns (e.g. obj_desc) may have more than 30 English words):

id.    cat_name.   cat_desc.       obj_name.    obj_desc.   obj_num
1.     furniture    living office   desk         4 corners    1.5.      
1      furniture.   living office.  chair.       4 legs.      0.8
1.     furniture.   restroom.       tub.         white wide.  2.7
1.     cloth.       fashion.        T-shirt.     black large. 1.1

I want (this may not be the best data structure):

id.     cat_item_aggregation
 1.     [['furniture', ['living office', ['desk', '4 corners', '1.5'], ['chair', '4 legs', '0.8']], ['restroom', [['tub', 'white wide', '2.7']], ['cloth', ['fashion', ['T-shirt', 'black', '1.1']]]]

I have tried array_agg from PostgreSQL: Efficiently aggregate array columns as part of a group by Postgres - aggregate two columns into one item

also json_build_object from Return as array of JSON objects in SQL (Postgres) How to group multiple columns into a single array or similar?

but they do not work in redash.

Could anybody let me know what the best data structure should be for this kind of table ? and how to build it ?

json may be better than array of array because it is hard to decompose the elements from array of array ?

thanks

edited Jan 22, 2022 at 17:32

asked Jan 22, 2022 at 17:04

user3448011

1,6094 gold badges23 silver badges48 bronze badges

What do you mean by they do not work? Do you get an error? Id so the what is the exact and complete error message? Do they just not produce what you expect? If so then exactly what do they produce and exactly what do you expect? Further having 30 (or more) English words means nothing do you actually mean populated columns? Would be best if you posted the actual table definition (ddl).

Belayer
– Belayer

2022-01-22 19:17:21 +00:00
Commented Jan 22, 2022 at 19:17
"the best data structure", if any, highly depends on the way you want to query this aggregate data. Can you give examples of typical queries/criteria to query this data ? Also, did you investigate the VIEW as a solution for your need ?

Edouard
– Edouard

2022-01-22 20:38:22 +00:00
Commented Jan 22, 2022 at 20:38

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

aggregate multiple columns in sql table as json or array

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked