0

I want to implement the scd2 in the snowflake tables. My source and target tables are present in snowflake only. The entire process has to be done using Azure Data Factory. I went through the documentation given by azure for implementing the scd2 using data flows but when I tried to create a dataset for snowflake connection its showing as disabled.

Is there any way or any documentation where I can see the steps to create SCD2 in adf with snowflake tables.

Thanks vipendra

2
  • I guess with SCD2 you mean slowly changing dimensions type 2? As far as I know the Azure Data Factory does not support you with that and you need to write your own custom SQL that fits your data. Commented Jun 10, 2020 at 14:10
  • Yeah I mean slowly changing Dimension only. Actually I am quite new to adf and sql both. We have an urgent requirement to implement that. Can you share some link where it can be implemented using some custom sql or if you have any sql snippets then it would be great. Commented Jun 10, 2020 at 18:05

2 Answers 2

1

SCD2 in ADF can be built and managed graphically via data flows. The Snowflake connector for ADF today does not work directly with data flows, yet. So for now, you will need to use the Copy Activity in an ADF pipeline and stage the dimension data in Blob or ADLS, then build your SCD2 logic in data flows using the staged data.

Your pipeline will look something like this:

[Copy Activity Snowflake-to-Blob] -> [Data Flow SCD2 logic Blob-to-Blob] -> [Copy Activity Blob-to-Snowkflake]

We are working on direct connectivity to Snowflake from data flows and hope to land that soon.

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks Mark... It looks promising. I will give it a try
Hi @MarkKromer I tried this approach but the problem I am facing is I will have to truncate the table in snowflake everytime before copying. Is there any way where this can be avoided. As all the updated data is in csv files which has to pushed to snowflake. If I do not truncate then it will result into duplicates.
Not quite yet. We are working on finishing the Snowflake connector for Data Flows directly, which will eliminate the need for staging and then you can use Alter Row for updates, upserts, etc.
Thanks @MarkKromer, it would definitely be very helpful in our future use cases. Is there any expected date for its launch so that atleast I can promise my customers that in future we can use it directly.
@VipendraSingh Microsoft released support for Snowflake in mapping data flows yesterday; techcommunity.microsoft.com/t5/azure-data-factory/…. Might be worth a look if this is an outstanding question for you.
|
0

If your source and target tables are both in Snowflake, you could use Snowflake Streams to do this. There's a blog post covering this in more detail at https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1

However, in short, if you have a source table source, you can put a stream on it like so:

create or replace stream source_changes on table source;

This will capture all the changes that are made to the source table. You can then build a view on that stream that establishes how you want to feed those changes into the SCD table. (The blog post uses case statements to put start and end dates in on each row in the view).

From there, you can use a Snowflake Task to automate the process of loading from the stream into the SCD only when the Stream actually has changes.

1 Comment

Yeah Actually My team is working on two approches. One way is through streams and other one is using adf. So I have to do it from adf. Please let me know if there is any way to do it from adf using data flows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.