1

I am trying to load a json array into a bigquery table. The structure of the data is as stated below :

[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]

Each json record corresponds to 1 row in BigQuery. The columns in BigQuery are : image and component. When I am trying to ingest the data it fails with a parsing error. If I try to change the structure to this , it works

 {"image":"testimage1","component":"component1"}{"image":"testimage2","component":"component2"}

I am trying to ingest as NEWLINE_DELIMITED_JSON Is there any way I can make the first json structure get ingested into Bigquery?

0

2 Answers 2

3

No, only a valid JSON can be ingested by BigQuery and a valid JSON doesn't start by an array.

You have to transform it slightly:

  • Either transform it in a valid JSON (add a {"object": at the beginning and finish the line by a }). Ingest that JSON in a temporary table and perform a query to scan the new table and insert the correct values in the target tables
  • Or remove the array definition [] and replace the },{ by }\n{ to have a JSON line.

Alternatively, you can ingest your JSON as a CSV file (you will have only 1 column with you JSON raw text in it) and then use the BigQuery String function to transform the data and insert them in the target database.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah but an array can be a valid JSON
3

You can follow this approach of looping through the list and writing it into a json file; then load the json file into BigQuery.

from google.cloud import bigquery
from google.oauth2 import service_account
import json

client = bigquery.Client(project="project-id")

dataset_id = "dataset-id"
table_id = "bqjson"


list_dict =[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]


with open ("sample-json-data.json", "w") as jsonwrite:
   for item in list_dict:
       jsonwrite.write(json.dumps(item) + '\n')     #newline delimited json file


dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)


job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True

with open("sample-json-data.json", "rb") as source_file:
   job = client.load_table_from_file(
       source_file,
       table_ref,
       location="us",  # Must match the destination dataset location.
       job_config=job_config,
   )  # API request

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.