1

I am new to Azure Data Factory. The requirement that I am working on is to read records from Azure SQL database and send those records in JSON to some HTTP API (external).

So, I am using a Lookup Activity to read records from SQL table and then using forEach activity to loop through them and pass each record to HTTP API (records POSTed as request body) via Web activity. This is working fine. Please note that API accepts only one record in the body as it doesn't support array of records in a single call.

But before passing the records to HTTP API, for each record I have to generate some UUID which should be unique for all the records and pass it as a Header value in HTTP POST request. Once the API response is successful, I have to insert that corresponding UUID, which was generated and passed to the corresponding request, to that SQL database table.

I am facing below issues:

  1. Reading SQL table data from Lookup is not good as it supports only 5000 rows whereas table that I am reading has more than 100k records. So how should I read the data and pass to forEach activity?
  2. How to generate UUID value for each record that I am reading from the table? Need to make sure that if I have 100k records, I must generate 100k unique values keeping in mind parallel execution of forEach activity (using batch count).
  3. How do I pass that UUID as a header value to each request in the Web activity?
  4. If API response if successful, how do I retrieve the UUID value for that corresponding request and insert into the SQL table.

1 Answer 1

0

You can use Script activity and followed by ForEach activity.

First using script activity retrieve the records from database and pass those records to for each activity which creates unique id and call the web activity. Then from the results of web activity you update the tables with unique id.

But make sure you have the column uuid in you table.

Script activity

enter image description here

Configure the linked service to your sql database and add the query to retrieve records.

ForEach activity

enter image description here

Add batch count and items from script activity.

Items : @activity('Script1').output.resultSets[0].rows

Next, Inside ForEach activity.

SetVariable to create unique id.

enter image description here

You create a new pipeline variable UUID and in dynamic expression you add the function @guid().

Next, WebActivity.

enter image description here

Here, add below details.

URL : Add you post request url Method : POST Body :

@concat( '{', '"id": "', item().id, '", ', '"name": "', item().name, '", ', '"salary": "', item().salary, '", ', '"age": "', item().age, '", ', '"uuid": "', variables('UUID'), '"', '}' )

Authentication : Your authentication

Headers : Name : uuidHeader Value : @variables('UUID')

Here, i given the body with uuid according to my api response you configure according to you api, also you add the uuid in header section like above giving the name.

Also, make sure your rest api response contains uuid and the record id because when you updating the table you need record id.

Next, Script activity to update the table according the current row id.

enter image description here

Below is the query.

@concat(
    'UPDATE Employees SET uuid = ''', 
    variables('UUID'), 
    ''' WHERE id = ', 
    item().id, 
    ';'
)

Output:

enter image description here

and in pipeline

enter image description here

Make you your HTTP API handle this many requests and gives the response.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the detailed explanantion. I have two questions: Is generated uuid guaranteed to be unique? Also, read in the documentation that ForEach only supports 100k max records. As a workaround, we may go with two pipelines- outer and inner, but will that not add complexity? How about doing the above stuff with Databricks/Spark and ADF to orchestrate it?
Yes it creates unique id. spark makes lot easier. You can do partition and parallelly call the api and update the records.
But make sure, your http api should handle such kind of loads. since it continuously invoke it.
The above activities run on Spark? I thought only Data Flows run on Spark. Is ADF (without Spark) performant enough to handle 1 million records?
Only dataflow uses spark cluster to load and transformation. ADF also handles such large data but dataflow has more option to increase performance like partitioning etc. what is your use case, only copy the data or need to do transformations?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.