0

I have a set of 35 models corresponding to objects of different types in my staging layer which I union in my intermediate layer. For the sake of this post I have replaced the data source that I am using with xxx. To perform the union, I use the following jinja for loop that uses the ref function to refer to those tables.

Relevant SQL code

{% set object_types = dbt_utils.get_column_values(
  table=ref("ref_include_object_types"),
  column="object_type"
) %}

WITH base AS (
  {%- for object_type in object_types %}
    SELECT
      '{{ object_type }}' AS object_type,
      {{ object_type }}   AS object_value
    FROM {{ ref(object_type) }}
    {%- if not loop.last %}
      UNION
    {%- endif %}
  {% endfor %}
)

However, the code above results in a Compilation Error and suggests that I add -- depends_on: {{ ref(object_type_name) }} statements for all object types:

Compilation Error:

Compilation Error in model int_xxx__combine_object_instances (models\intermediate\xxx\int_xxx__combine_object_instances.sql)

dbt was unable to infer all dependencies for the model "int_xxx__combine_object_instances".

To fix this, add the following hint to the top of the model "int_xxx__combine_object_instances":

  -- depends_on: {{ ref('stg_xxx__object_type_name') }}

Considered solutions:

Solution 1

I can obviously create a list of -- depends_on: statements dynamically to deal with this issue, but I would rather avoid this, since it would make the code harder to maintain and read.

Solution 2

Another solution is suggested in this issue's comment:

{% for model in object_types %}
{% set depends_on = "--depends_on: {{  ref( '" ~  model ~ "' )  }}" %}
{{  depends_on  }}
{% endfor %}

This solution did not work for me but I am unsure why. I get the exact same compilation error as before. The solution is both upvoted and downvoted on github but noone has really commented on why it's a good/bad solution or why it might not work. I assume it doesn't work because the dependencies are acquired using jinja macros and are not rendered during compilation time.

Questions

  1. Can someone help me understand why solution 2 would not work?
  2. Are there solutions to this issue other than the two solutions I have suggested above?
  3. Would it be possible to somehow set a dependency on a group of models? E.g. set a dependency on a schema so that all transformations within the staging schema are finished before my intermediate layer transformation starts?
2
  • Could you help me reproduce the issue? I created two dummy models. Put their names into a variable:{% set object_types = ['dummy_1', 'dummy_2'] %}. And then used your code only replacing list of columns to *. And it worked without a problem. Is there another part of the code that causes the issue? If so, could you update your question with it? Commented Jul 16, 2024 at 17:12
  • @KlimentMerzlyakov I added how the object_types are calculated, in case it's relevant. It looks like hardcoding the object types works, but using the dbt_utils function to extract them from a column causes the compilation error. I guess that's because the dbt_utils function is a macro. Commented Jul 17, 2024 at 11:57

2 Answers 2

0

The issue stems from the implementation of the dbt_utils.get_column_values function used to acquire the object_types.

{% set object_types = dbt_utils.get_column_values(
  table=ref("ref_include_object_types"),
  column="object_type"
) %}

Checking the source code one can find the following lines:

{# Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
    {% set default = [] if not default %}
    {{ return(default) }}
{% endif %}

Apparently, the part of the function returning the column values is not called during compilation because they don't want to query the database in parsing mode and create more references.

I dealt with the issue by replacing my seed table with project variables and creating macros for parsing them.

Sign up to request clarification or add additional context in comments.

Comments

0

I've finally found a solution to set the dependencies dynamically.

Refer to this macros: https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/get_relations_by_pattern.sql

Basically, this lets you set the deps manually during template parsing, which avoids the requirement to add the forced dependency comment (--depends on: {{ ref('table_name') }}).

{% set relations = dbt_utils.get_relations_by_pattern('%dataset_name_pattern%', '%table name patter%') -%}
{% for relation in relations -%}
select
    *
from
    {{ relation }}
{% endfor -%}

If you ever need to add some conditions based on the table ID, you can approach it this way (may be not optimal):

{% set relations = dbt_utils.get_relations_by_pattern('%dataset_name_pattern%', '%table name patter%') -%}
{% for relation in relations -%}
select
    {% if relation.path.identifier == 'required_table' -%}
    col1 as id
    {% elif -%}
    col2 as id
    {% endif -%}
from
    {{ relation }}
{% endfor -%}

The relation.path.identifier returns string representation of full table path ID (e.g. project_id.dataset_id.table_id).

Hopefully, this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.