How to Insert a partition into BigQuery's fetch time partitioned table in Python by specifying a partition

Question

summary

How can I use Python to specify partitions in the fetch time partitioning table to fetch?

What we tried

I have found that the following is possible when inserting in SQL. https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables

but I don't know how to describe it in Python. I am thinking of using "client.load_table_from_dataframe" from google-cloud-bigquery module. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.load_table_from_dataframe

I found the following document, but when I use the name _PARTITIONTIME I get the following error. https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-partitioned#bigquery_load_table_partitioned-python

google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/aaa/jobs?uploadType=multipart: Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, _TABLE_, _FILE_, _ROW_TIMESTAMP, __ROOT__ and _COLIDENTIFIER

execution environment

python: 3.8.10
google-cloud-bigquery: 3.2.0
pandas: 1.4.3
About Certification
- If PARTITION is not specified, we consider that there is no problem because data can be inserted.

table

CREATE TABLE IF NOT EXISTS `aaa.bbb.ccc`(
  c1 INTEGER,
  c2 STRING
)
PARTITION BY _PARTITIONDATE;

What I want to do

SQL

INSERT INTO `aaa.bbb.ccc` (c1, c2, _PARTITIONTIME) VALUES (99, "zz", TIMESTAMP("2000-01-02"));

Python ( Tried and tested code )

import pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery.enums import SqlTypeNames
from google.cloud.bigquery.job import WriteDisposition
from datetime import datetime

client = bigquery.Client(project="aaa")
job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("c1", SqlTypeNames.INTEGER),
        bigquery.SchemaField("c2", SqlTypeNames.STRING),
        bigquery.SchemaField("_PARTITIONTIME", SqlTypeNames.TIMESTAMP),
    ],
    write_disposition=WriteDisposition.WRITE_APPEND,
    time_partitioning=bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,
        field="_PARTITIONTIME",  # Name of the column to use for partitioning.
        expiration_ms=7776000000,  # 90 days.
    ),
)
df = pd.DataFrame(
    [
        [1, "a", datetime.strptime("2100-11-12", "%Y-%m-%d")],
        [2, "b", datetime.strptime("2101-12-13", "%Y-%m-%d")],
    ],
    columns=["c1", "c2", "_PARTITIONTIME"],
)
job = client.load_table_from_dataframe(df, "aaa.bbb.ccc", job_config=job_config) # error
result = job.result()

multi-post

We also ask the following questions. https://ja.stackoverflow.com/questions/90760

Anjela B · Accepted Answer · 2022-08-30 05:03:42Z

You can just change the naming convention _PARTITIONTIME to another name since it is part of case sensitive prefixes. The code below worked:

import pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery.enums import SqlTypeNames
from google.cloud.bigquery.job import WriteDisposition
from datetime import datetime

client = bigquery.Client(project="<your-project>")
job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("c1", SqlTypeNames.INTEGER),
        bigquery.SchemaField("c2", SqlTypeNames.STRING),
        bigquery.SchemaField("_P1", SqlTypeNames.TIMESTAMP),
    ],
    write_disposition=WriteDisposition.WRITE_APPEND,
    time_partitioning=bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,
        field="_P1",  # Name of the column to use for partitioning.
        expiration_ms=7776000000,  # 90 days.
    ),
)
df = pd.DataFrame(
    [
        [1, "a", datetime.strptime("2100-11-12", "%Y-%m-%d")],
        [2, "b", datetime.strptime("2101-12-13", "%Y-%m-%d")],
    ],
    columns=["c1", "c2", "_P1"],
)
job = client.load_table_from_dataframe(df, "<your-project>.<your-dataset>.ccc", job_config=job_config) # error
result = job.result()

Output:

As for the query you want to insert:

INSERT INTO `<your-project>.<your-dataset>.ccc` (c1, c2, _P1) VALUES (99, "zz", TIMESTAMP("2000-01-02"));

This is not possible as explained in this SO post answered by a Googler. Since in the expiration_ms field we stated that the expiration is 90 days, 90 days before the current day(the day python script is executed) are valid dates, anything before that are not valid. This query will work:

INSERT INTO `<your-project>.<your-dataset>.ccc` (c1, c2, _P1) VALUES (99, "zz", TIMESTAMP("2022-06-01"));

Output:

Collectives™ on Stack Overflow

How to Insert a partition into BigQuery's fetch time partitioned table in Python by specifying a partition

summary

What we tried

execution environment

table

What I want to do

SQL

Python ( Tried and tested code )

multi-post

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

summary

What we tried

execution environment

table

What I want to do

SQL

Python ( Tried and tested code )

multi-post

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related