2

I am looking for a way how the schema is set with a json file in Python on Big Query. The following document says I can set it with Schema field one by one, but I want to find out more efficient way. https://cloud.google.com/bigquery/docs/schemas

Autodetect would be skeptical to make it in this case. I will appreciate it if you helped me.

2
  • Do you mean that you provide a JSON that describe your schema? And you want to match this definition in the field definition of BigQuery? Commented Aug 7, 2020 at 19:05
  • Yes, I do. Is there any way to make it true what you said? Commented Aug 8, 2020 at 8:37

2 Answers 2

4

You can create a JSON file with columns/data types and use the below code to build BigQuery Schema.

JSON File (schema.json):

[
    {
        "name": "emp_id",
        "type": "INTEGER"
    },
    {
        "name": "emp_name",
        "type": "STRING"
    }
]

Python Code:

import json
from google.cloud import bigquery

bigquerySchema = []
with open('schema.json') as f:
    bigqueryColumns = json.load(f)
    for col in bigqueryColumns:
        bigquerySchema.append(bigquery.SchemaField(col['name'], col['type']))

print(bigquerySchema)
Sign up to request clarification or add additional context in comments.

1 Comment

Does it still works if the schema.json file has nested fields?
0

Soumendra Mishra is already helpful, but here is a bit more general version that can optionally accept addition fields such as mode or description:

JSON File (schema.json):

[
    {
        "name": "emp_id",
        "type": "INTEGER",
        "mode": "REQUIRED"
    },
    {
        "name": "emp_name",
        "type": "STRING",
        "description": "Description of this field"
    }
]

Python Code:

import json
from google.cloud import bigquery

table_schema = []
# open JSON file read only
with open('schema.json', 'r') as f:
    table_schema = json.load(f)
    for entry in table_schema:
        # rename key; bigquery.SchemaField expects `field` to be called `field_type`
        entry["field_type"] = entry.pop("type")
        # ** effectively provides data as argument:value pairs (e.g. name="emp_id")
        table_schema.append(bigquery.SchemaField(**entry))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.