0

We have one use case in our data projects where feeds coming from source system via live streaming may detect certain issues and resend the same transaction again with a flag that indicates that transaction is corrected.

The question is does Databricks provides an option to overwrite the earlier transaction so that I do not have duplicates for the same transactions? Have anyone encountered this scenario and what are the approaches you have tried? I have come across Databricks community blog that talks about using 'Merge' statements. Will that be the only option or have you implemented something else?

1
  • Yes, you can avoid inserting the duplicate records in delta table using merge option only .By the way , can you please confirm what limitation you are seeing this option @chandresh_cool Commented May 15, 2024 at 12:01

1 Answer 1

0

Yes, definitely the best possible solution would be to use a merge statement with a good condition. The beauty of merge statement is that you can decide what will happen when there is a match and when we do not have a match. Basically in your scenario, if you have a match you will just ignore that record Please check this documentation

I wouldn't suggest other options as merge is the way to go.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.