Automating creation of sql files in DBT

Question

Hey I am a beginner in DBT and I am trying to create the staging layer. I have created the table schema and now want to automate the process of creating models as I have to create over 100+ sql models.

My schema is like (xyz.yml):

- schema: xyz

  tables:    
    - name: abc
      loaded_at_field: updated_at
      freshness:
        warn_after: {count: 24 , period: hour}
        filter: updated_at>current_date-7
    - name: def
      loaded_at_field: updated_at
      freshness:
        warn_after: {count: 24 , period: hour}
        filter: updated_at>current_date-7   
    - name: ghi    
    - name: jkl       
    - name: mno

SQL models will be generated for each of the table name. I have more than 100 tables and would like to create the staging model SQL file automatically (like xyz_abc.sql) for all the table name in dbt.

MYK · Accepted Answer · 2022-03-08 20:58:36Z

1

You could write a little python script for this (or any other tool for that matter).

I mean, that has the big dependency of knowing how to write python...

The code should be something like:

import yaml

with open('{YOUR PATH TO THE SCHEMA FILE}/schema.yaml','r') as file:
    schema = yaml.safe_load(file)
    
names = [table['name'] for table in schema['tables']]

for name in names:
   with open(f'{name}.sql', 'w') as file:
       file.write(f'SELECT * FROM {name}')

NOTE: I did not test this code

answered Mar 8, 2022 at 20:58

MYK

3,1034 gold badges27 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dharman · Accepted Answer · 2022-03-14 15:38:19Z

1

you could write a python script as @MYK mentioned above. But there is also a dbt package for creating the schema.yml and model files

take a look at this package. https://github.com/dbt-labs/dbt-codegen

You might still wanna use some python to clean up the output and provide table names dynamically to codegen.

edited Mar 14, 2022 at 15:38

Dharman♦

33.9k27 gold badges105 silver badges157 bronze badges

answered Mar 14, 2022 at 15:32

gurjarprateek

4996 silver badges14 bronze badges

1 Comment

Cir02 Over a year ago

Hi, i´m struggling with the output using CliRunner and dbt.cli.main, any tips to capture the dbt-codegen YAML outputs? Thanks

RobMcZag · Accepted Answer · 2023-01-25 17:09:09Z

In general I am quite critical of the idea of generating 100 models. The reason is simple: unless you just need to read the data of these 100 tables and expose them "as is", you will need to apply some business logic on them.

If you are in the first case... why do you need dbt at all?

If you are going to apply some business logic... writing the code is the least time consuming operation: if you are trying to materialize the data and save changes you need to know the primary key, if you want to combine data from multiple system you need to know the business keys and have mapping tables and some idea of how to apply master data management... writing code that you can generate is the least of the problems.

If you have a project with 100 tables that is no trivial work and, assuming that you need to use all the 100 tables, you will need to understand them and write business rules on them. In this context the automatic model generation would save a tiny fraction of the time spent on each of the tables... so why bother?

IMO much better having something that saves you the grunt work, but you write each model so you are sure to apply the right pattern.

Also, I do prefer adding tables only when needed, using something like the dbt codegen package or, if you have a repeatable pattern that you want to use, a self written SQL query that uses the COLUMNS view from the INFORMATION_SCHEMA to provide you the table specific values that you accomodate in the template that applies the PATTERN.

A query like the following already goes a long way to give you the beef of the table so that you can change the names you do not like, and apply eventual casts or other hard business rules with minimal effort:

SELECT ', ' || COLUMN_NAME || ' as '|| COLUMN_NAME || ' -- ' || DATA_TYPE as SQL_TEXT
FROM <db>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'XXX' and TABLE_NAME = 'YYY'
ORDER BY ORDINAL_POSITION;

And then you add one model at a time when you actually need it (YAGNI principle) without starting by "loading all tables" from some datasource.

PS You do not need to repeat the same freshness SLA definition 100 times. You can declare it once at source system level and just override the parameter that are different for a specific table. Start by saving complexity where it is easy ;)

Collectives™ on Stack Overflow

Automating creation of sql files in DBT

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related