include multiple sql files under one task in BigQueryInsertJobOperator

Question

My syntax looks like this, I want the below 3 SQL files in the list to be executed sequentially inside the BigQueryInsertJobOperator class as one task.

This approach only executes the first SQL file, is there an alternative approach to solve this problem?

'''

  t1 = BigQueryInsertJobOperator(

    task_id='data load',

    configuration={

            "query": { 
                             
            "query": "{% include ['sqlfile_1.sql', 'sqlfile_2.sql', 'sqlfile_3.sql'] 
            "useLegacySql": False,
            }
        },
    gcp_conn_id='bq_conn')

'''

Scott B · Accepted Answer · 2022-06-01 07:26:21Z

I can't seem to find a way to have multiple sql files on one task ID. However, you might want to consider below alternative approach in which I loop the multiples sql files with their respective task ID's for each sql file but belonging to one DAG only.

Please see below sample code:

from airflow import models
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.utils.dates import days_ago

PROJECT_ID = "your-project-id"
DATASET_NAME = "your-dataset"

TABLE_1 = "your-table"
dag_id = "your-dag-id"

sql_files = [
    'my-query1.sql',
    'my-query2.sql',
    'my-query3.sql'
]

with models.DAG(
    dag_id,
    schedule_interval=None,  # Override to match your needs
    start_date=days_ago(1),
    tags=["example"],
    user_defined_macros={"DATASET": DATASET_NAME, "TABLE": TABLE_1},
) as dag:
    
    for sqlrun in sql_files:
         my_taskid = sqlrun.split(".")
         my_final_taskid = my_taskid[0]
         case_june_1 = BigQueryInsertJobOperator(
              task_id=my_final_taskid,
              configuration={
                    "query": {
                         "query": f"{sqlrun}",
                         "useLegacySql": False,
                    }
              },
         )

Output:

In addition, per this documentation, query parameter only accepts a STRING and then parse it to read as query.

Jasmine · Accepted Answer · 2022-06-01 09:41:16Z

0

Not recommended to use multiple sql in one task!

But you can execute multiple sql commands in one round trip

sqlfile_1.sql

SELECT *
FROM table_1

sqlfile_2.sql

SELECT *
FROM table_2

------------ concate SQL file with ; ------->

sqlfiles.sql (add ; in the end of sql)

SELECT *
FROM table_1;
SELECT *
FROM table_2;

(Not sure this way can work for you or not, it can work using Bigquery.)

answered Jun 1, 2022 at 9:41

Jasmine

812 bronze badges

Collectives™ on Stack Overflow

include multiple sql files under one task in BigQueryInsertJobOperator

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related