1

Hey I am a beginner in DBT and I am trying to create the staging layer. I have created the table schema and now want to automate the process of creating models as I have to create over 100+ sql models.

My schema is like (xyz.yml):

- schema: xyz

  tables:    
    - name: abc
      loaded_at_field: updated_at
      freshness:
        warn_after: {count: 24 , period: hour}
        filter: updated_at>current_date-7
    - name: def
      loaded_at_field: updated_at
      freshness:
        warn_after: {count: 24 , period: hour}
        filter: updated_at>current_date-7   
    - name: ghi    
    - name: jkl       
    - name: mno 

SQL models will be generated for each of the table name. I have more than 100 tables and would like to create the staging model SQL file automatically (like xyz_abc.sql) for all the table name in dbt.

3 Answers 3

1

You could write a little python script for this (or any other tool for that matter).

I mean, that has the big dependency of knowing how to write python...

The code should be something like:

import yaml

with open('{YOUR PATH TO THE SCHEMA FILE}/schema.yaml','r') as file:
    schema = yaml.safe_load(file)
    
names = [table['name'] for table in schema['tables']]

for name in names:
   with open(f'{name}.sql', 'w') as file:
       file.write(f'SELECT * FROM {name}')

NOTE: I did not test this code

Sign up to request clarification or add additional context in comments.

Comments

1

you could write a python script as @MYK mentioned above. But there is also a dbt package for creating the schema.yml and model files

take a look at this package. https://github.com/dbt-labs/dbt-codegen

You might still wanna use some python to clean up the output and provide table names dynamically to codegen.

1 Comment

Hi, i´m struggling with the output using CliRunner and dbt.cli.main, any tips to capture the dbt-codegen YAML outputs? Thanks
0

In general I am quite critical of the idea of generating 100 models. The reason is simple: unless you just need to read the data of these 100 tables and expose them "as is", you will need to apply some business logic on them.

If you are in the first case... why do you need dbt at all?

If you are going to apply some business logic... writing the code is the least time consuming operation: if you are trying to materialize the data and save changes you need to know the primary key, if you want to combine data from multiple system you need to know the business keys and have mapping tables and some idea of how to apply master data management... writing code that you can generate is the least of the problems.

If you have a project with 100 tables that is no trivial work and, assuming that you need to use all the 100 tables, you will need to understand them and write business rules on them. In this context the automatic model generation would save a tiny fraction of the time spent on each of the tables... so why bother?

IMO much better having something that saves you the grunt work, but you write each model so you are sure to apply the right pattern.

Also, I do prefer adding tables only when needed, using something like the dbt codegen package or, if you have a repeatable pattern that you want to use, a self written SQL query that uses the COLUMNS view from the INFORMATION_SCHEMA to provide you the table specific values that you accomodate in the template that applies the PATTERN.

A query like the following already goes a long way to give you the beef of the table so that you can change the names you do not like, and apply eventual casts or other hard business rules with minimal effort:

SELECT ', ' || COLUMN_NAME || ' as '|| COLUMN_NAME || ' -- ' || DATA_TYPE as SQL_TEXT
FROM <db>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'XXX' and TABLE_NAME = 'YYY'
ORDER BY ORDINAL_POSITION;

And then you add one model at a time when you actually need it (YAGNI principle) without starting by "loading all tables" from some datasource.

PS You do not need to repeat the same freshness SLA definition 100 times. You can declare it once at source system level and just override the parameter that are different for a specific table. Start by saving complexity where it is easy ;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.