How to implement scd2 in snowflake tables using Azure Data Factory

Question

I want to implement the scd2 in the snowflake tables. My source and target tables are present in snowflake only. The entire process has to be done using Azure Data Factory. I went through the documentation given by azure for implementing the scd2 using data flows but when I tried to create a dataset for snowflake connection its showing as disabled.

Is there any way or any documentation where I can see the steps to create SCD2 in adf with snowflake tables.

Thanks vipendra

I guess with SCD2 you mean slowly changing dimensions type 2? As far as I know the Azure Data Factory does not support you with that and you need to write your own custom SQL that fits your data. — Rick
– Rick, Commented Jun 10, 2020 at 14:10
Yeah I mean slowly changing Dimension only. Actually I am quite new to adf and sql both. We have an urgent requirement to implement that. Can you share some link where it can be implemented using some custom sql or if you have any sql snippets then it would be great. — Vipendra Singh
– Vipendra Singh, Commented Jun 10, 2020 at 18:05

Mark Kromer MSFT · Accepted Answer · 2020-06-10 18:12:57Z

1

SCD2 in ADF can be built and managed graphically via data flows. The Snowflake connector for ADF today does not work directly with data flows, yet. So for now, you will need to use the Copy Activity in an ADF pipeline and stage the dimension data in Blob or ADLS, then build your SCD2 logic in data flows using the staged data.

Your pipeline will look something like this:

[Copy Activity Snowflake-to-Blob] -> [Data Flow SCD2 logic Blob-to-Blob] -> [Copy Activity Blob-to-Snowkflake]

We are working on direct connectivity to Snowflake from data flows and hope to land that soon.

answered Jun 10, 2020 at 18:12

Mark Kromer MSFT

3,8681 gold badge12 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Vipendra Singh Over a year ago

Thanks Mark... It looks promising. I will give it a try

Vipendra Singh Over a year ago

Hi @MarkKromer I tried this approach but the problem I am facing is I will have to truncate the table in snowflake everytime before copying. Is there any way where this can be avoided. As all the updated data is in csv files which has to pushed to snowflake. If I do not truncate then it will result into duplicates.

Mark Kromer MSFT Over a year ago

Not quite yet. We are working on finishing the Snowflake connector for Data Flows directly, which will eliminate the need for staging and then you can use Alter Row for updates, upserts, etc.

Vipendra Singh Over a year ago

Thanks @MarkKromer, it would definitely be very helpful in our future use cases. Is there any expected date for its launch so that atleast I can promise my customers that in future we can use it directly.

Hotchips Over a year ago

@VipendraSingh Microsoft released support for Snowflake in mapping data flows yesterday; techcommunity.microsoft.com/t5/azure-data-factory/…. Might be worth a look if this is an outstanding question for you.

|

Hotchips · Accepted Answer · 2020-06-11 04:11:05Z

0

If your source and target tables are both in Snowflake, you could use Snowflake Streams to do this. There's a blog post covering this in more detail at https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1

However, in short, if you have a source table source, you can put a stream on it like so:

create or replace stream source_changes on table source;

This will capture all the changes that are made to the source table. You can then build a view on that stream that establishes how you want to feed those changes into the SCD table. (The blog post uses case statements to put start and end dates in on each row in the view).

From there, you can use a Snowflake Task to automate the process of loading from the stream into the SCD only when the Stream actually has changes.

answered Jun 11, 2020 at 4:11

Hotchips

6335 silver badges19 bronze badges

1 Comment

Vipendra Singh Over a year ago

Yeah Actually My team is working on two approches. One way is through streams and other one is using adf. So I have to do it from adf. Please let me know if there is any way to do it from adf using data flows.

Collectives™ on Stack Overflow

How to implement scd2 in snowflake tables using Azure Data Factory

2 Answers 2

9 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related