0

Fairly new to Snowflake but have loaded data successfully via various methods.

Problem: I am calling a Web API via Azure Data Factory using the pagination rules to write to a single JSON file in blob storage. This is done using multiple calls via single ADF data copy activity utilising the AbsoluteURL to merge to a single file (I could write extra, complex ADF logic to generate multiple files, but for this example want to go with a single file approach). This is really efficient in my ADF pipeline and I can either write as an array or set of objects.

The ADF part works well, but I am having trouble parsing the "merged" JSON file when loading into a table of rows in Snowflake via a stage.

The output JSON file called "setOfObjects.json" looks like this for ADF JSON sink setting: setOfObjects (JSON lines):

{"values":[{"a":"1","b":"1","c":"1"},{"a":"2","b":"2","c":"2"}]}
{"values":[{"a":"3","b":"3","c":"3"},{"a":"4","b":"4","c":"4"}]}
<empty line>

In Snowflake, I create a stage such as

create or replace stage dbo.stage_json_example url = 'azure://<endpoint>.blob.core.windows.net/test/stackOverflow/'
credentials = (azure_sas_token = '<sas_token>')
file_format = (type = 'json');

Then in Snowflake, I try to parse the JSON file such as, but it returns NULL value:

select $1:values:a::string
from @dbo.stage_json_example/setOfObjects.json;

From above, I want to convert the single JSON file into 4 rows being:

a b c
1 1 1
2 2 2
3 3 3
4 4 4

Any suggestions are appreciated. Thanks

4
  • Please include the code you used to define @dbo.stage_json_example. Also, try to have only one JSON format to parse - check ADF learn.microsoft.com/en-us/azure/data-factory/format-json to set the output to JSON Lines (the default) Commented Jan 5, 2021 at 20:51
  • Yes I have tried both setOfObjects (JSON Lines),which is the "Set of Object" example above, as well as arrayOfObjects which is the "Set of Array" example above. Have also tried all of the "Copy Behaviour" options which do not change the JSON file format. Snowflake stage is standard and I have different stage options for both strip_outer_array values. create or replace stage dbo.stage_json_example url = '<url>' credentials = (azure_sas_token = '<sas_token>') file_format = (type = 'json',strip_outer_array = <true/false>); Commented Jan 5, 2021 at 23:22
  • For simplicity of reproducing the problem - can you give me one file example, and one setup example? I'm confused by having the 2 alternatives in the question. If I can reproduce for one file, then I can help you solve the issue. Commented Jan 5, 2021 at 23:26
  • @FelipeHoffa Hi, I figured out the problem by looking through a bunch of Snowflake articles. Thanks for your help. Commented Jan 6, 2021 at 0:23

2 Answers 2

1

OK, found the answer from this link https://community.snowflake.com/s/article/json-data-parsing-in-snowflake

This Snowflake code solved the problem.

select t.value:a::int as "a", t.value:b::int as "b", t.value:c::int  as "c"
from @dbo.stage_json_example/setOfObjects.json as S
, table(flatten(S.$1,'values')) t;
Sign up to request clarification or add additional context in comments.

Comments

0

Perhabs you try to write a { at the beginning and } at the end of your document. JSON always needs a document root. With this change you have one.

so your Example would looks like:

{
 {"values":[{"a":"1","b":"1","c":"1"},{"a":"2","b":"2","c":"2"}]}
 {"values":[{"a":"3","b":"3","c":"3"},{"a":"4","b":"4","c":"4"}]} 
}

3 Comments

Pls note there are 3 lines created by ADF. In Snowflake I get this error: Error parsing JSON: misplaced [, pos 2 when trying select parse_json(*) from ( select concat('{', $1, '}') v from @DBO.stage_json_example ) t;
and snowflakes needs a document root, I had the same issue with gson. gson needs one document root, otherwise parsing is not posible. With rapidjson, we did write the data to one file, without document root.
OK understand, but I am hoping for a solution that does not require post processing of the multiple line json file created in ADF. Is there Snowflake logic that can handle this? if not possible I will look to more complex solutions such as single files or file manipulation as you mentioned. For example I cannot apply this select V from ( select '{'::VARIANT V, 1 as Sort UNION select $1::VARIANT V, 2 as Sort from @DBO.stage_json_example union select '}'::VARIANT V, 3 as Sort ) t order by Sort;

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.