0

I'm looking for effective way to upload the following array to Big query table in this format :

Big query columns (example)

event_type: video_screen
event_label: click_on_screen
is_ready:false
time:202011231958
long:1
high:43
lenght:0

**

Array object

**

[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]

I thought of several options but none of them seems to be The best practice.

Option A: Upload the following file to Google storage and then create table related to this bucket - not worked because of file format, Google Bigquery can't parse array from Google bucket.

Option B: Use by backend (node.js) to change the file structure to CSV and upload it directly to Bigquery - failed because of latency (the array is long, more than my example).

Option C: Use Google Appcript to get the array object and insert it to Bigquery - I didn't find a simple code for this, Google storage has no API connected to Appscript.

Someone deal with such a case and can share his solution? What is the best practice for this case? if you've code for this it will be great.

5
  • Option D, write the file in csv or another supported format directly. Then you can insert the rows into BQ easily Commented Nov 27, 2020 at 13:37
  • What's the size of the longest line in your file? Commented Nov 27, 2020 at 13:46
  • 100K lines per file, but I process multiple files every 5 minutes. Commented Nov 27, 2020 at 13:49
  • Lines per file is not problem. What's the longest line in your file? 1 single line how long is it? Commented Nov 27, 2020 at 13:53
  • 6 columns, max 10. Commented Nov 27, 2020 at 13:54

1 Answer 1

2

Load the file from GCS to BigQuery into a table with 1 single string column. So you get 100K rows and one single column.

Essentially you will have a table that has a JSON in a string.

Use JSON_EXTRACT_ARRAY to process the JSON array into elements

then later extract each position into its coresponding variable/column and write it to a table

here is a demo:

with t as (
    select '[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]' as s
),
elements as (
select e from t,unnest(JSON_EXTRACT_ARRAY(t.s)) e
)
select 
    json_extract_scalar(e,'$[0]') as event_type ,
    json_extract_scalar(e,'$[1]') as event_label,
from elements

the output is:

enter image description here

Sign up to request clarification or add additional context in comments.

7 Comments

It sounds good but, how to load the file from GCS to BigQuery into a table with 1 single string column and 100K rows? BigQuery can't parse this type of file [[1,2,3],[1,2,3]]
@idan Load as a CSV format, and specify a separator that doesn't not exists in your row, such as TAB or ~, or ^. This way you load the entire line as 1 column.
All the file is entered to one single cell, how to create a separate cell?
That means all your input is on one single line. It's not a problem if the file is entered in one cell. That is exactly what you need. Now use my example to explode it.
I'm not sure this is the right thing to do, all the data will enter to one single cell? there is no limit? String limit?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.