0

My use case is as following:

I want to copy data from Table-A to Table-B and transform field1 from array of struct to array of string where the string is the val1 property of struct in table-A and ignore val2.

Table-A:
field1: array<struct<val1: str, val2: int>>
sample data:
[{val1: "abc", val2: 123}, {val1: "def", val2: 456}], [{val1: "xyz", val2: 789}]

Table-B:
field1: array<string>
sample data:
["abc", "def"], ["xyz"]

I am not able to figure out how I can select the field1 column with transformation on it through hive query language.

The things I was able to figure out was I could explode the array, perform a select of the val1 and then do collect_list but after trying multiple times could not get the correct syntax.

My query was like:

select collect_list(select col.val1 
  from explode(field1) as col) from table-A

I also want to strictly do it through hiveql and not through a udf in python.

Thank you.

2
  • can you post some sample data with the expected output and your attempt? Commented Aug 14, 2020 at 13:49
  • @VamsiPrabhala, added the sample data. my query was like: select collect_list(select col.val1 from explode(field1) as col) from table-A Commented Aug 14, 2020 at 14:07

2 Answers 2

1

Explode original array using Lateral View + explode and collect array of struct.val1 using collect_set() or collect_list():

with mydata as (--This is your data example, use your table instead of this CTE
select stack (2,
array(named_struct("val1", "abc", "val2", 123), named_struct("val1", "def", "val2", 456)), 
array(named_struct("val1", "xyz", "val2", 789))
) as myarray
)

select t.myarray as original_array, collect_set(s.val1) as result_array
  from mydata t
       lateral view explode(myarray) e as s --struct
group by t.myarray 

Result:

original_array                                          result_array
[{"val1":"abc","val2":123},{"val1":"def","val2":456}]   ["abc","def"]
[{"val1":"xyz","val2":789}]                             ["xyz"]

Your structure can be also declared as map, not struct. In this case use s['val1'] instead of s.val1 to get map element.

Sign up to request clarification or add additional context in comments.

Comments

0

There is some magic in the handling of arrays that allow you to do this:

select t.myarray as original_array, t.myarray.val1 from mydata t

I.e. selecting struct field val1 from an array of structs will return an array of val1

From http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.