summary
How can I use Python to specify partitions in the fetch time partitioning table to fetch?
What we tried
I have found that the following is possible when inserting in SQL. https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables
but I don't know how to describe it in Python. I am thinking of using "client.load_table_from_dataframe" from google-cloud-bigquery module. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.load_table_from_dataframe
I found the following document, but when I use the name _PARTITIONTIME I get the following error.
https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-partitioned#bigquery_load_table_partitioned-python
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/aaa/jobs?uploadType=multipart: Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, _TABLE_, _FILE_, _ROW_TIMESTAMP, __ROOT__ and _COLIDENTIFIER
execution environment
- python: 3.8.10
- google-cloud-bigquery: 3.2.0
- pandas: 1.4.3
- About Certification
- If PARTITION is not specified, we consider that there is no problem because data can be inserted.
table
CREATE TABLE IF NOT EXISTS `aaa.bbb.ccc`(
c1 INTEGER,
c2 STRING
)
PARTITION BY _PARTITIONDATE;
What I want to do
SQL
INSERT INTO `aaa.bbb.ccc` (c1, c2, _PARTITIONTIME) VALUES (99, "zz", TIMESTAMP("2000-01-02"));
Python ( Tried and tested code )
import pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery.enums import SqlTypeNames
from google.cloud.bigquery.job import WriteDisposition
from datetime import datetime
client = bigquery.Client(project="aaa")
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("c1", SqlTypeNames.INTEGER),
bigquery.SchemaField("c2", SqlTypeNames.STRING),
bigquery.SchemaField("_PARTITIONTIME", SqlTypeNames.TIMESTAMP),
],
write_disposition=WriteDisposition.WRITE_APPEND,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="_PARTITIONTIME", # Name of the column to use for partitioning.
expiration_ms=7776000000, # 90 days.
),
)
df = pd.DataFrame(
[
[1, "a", datetime.strptime("2100-11-12", "%Y-%m-%d")],
[2, "b", datetime.strptime("2101-12-13", "%Y-%m-%d")],
],
columns=["c1", "c2", "_PARTITIONTIME"],
)
job = client.load_table_from_dataframe(df, "aaa.bbb.ccc", job_config=job_config) # error
result = job.result()
multi-post
We also ask the following questions. https://ja.stackoverflow.com/questions/90760

