13

How do we save data inside of an XML payload to blob storage?

input

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>sdfsdfjijOIJOISJDFQPWORPJkjsdlfkjlksdf==</bytes>
</root>

desired result

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>/blob/path/toSavedPayload</bytes>
</root>
  1. save bytes somewhere in blob
  2. replace bytes with URI of where bytes were saved

How do we use data factory to extract a node from XML and save it to blob?

3
  • Need a bit more info - define the XML payload? Do you mean it's a file in data lake or is returned in the ADF pipeline from a web or database call? Commented Jan 17, 2020 at 15:08
  • xml blob is stored in a storage account, i'm looking for functionality that is similar to a regular XSLT transform Commented Jan 17, 2020 at 15:11
  • I used Azure Batch Custom Activity to solve this problem. Here is another SO answer where I mentioned that: stackoverflow.com/questions/59938374/… Commented Mar 6, 2020 at 16:48

3 Answers 3

2

Currently, ADF doesn’t support XML natively. But

  1. You may write your own code and then use custom activity of ADF.
  2. SSIS has built-in support for XML as a source. Maybe you could take a look.
Sign up to request clarification or add additional context in comments.

4 Comments

just wondering, why are you mentioning SSIS? are you saying we can export from SSIS and import into ADF?
Yes, you can "lift and shift" your existing SSIS packages to a cloud instance of integration services. Read more here: learn.microsoft.com/en-us/sql/integration-services/lift-shift/…
@MartinEstebanZurita would love an example of an SSIS package that does an XML transform and then how that can be lifted and shifted into ADF
2

For that case you have to use some custom code to do this. I would choose from these options

  • Azure Functions - only for some simple data processing
  • Azure Databricks - in the case you need to process some big XML data

1 Comment

2

As Azure Data Factory does not support XML natively, I would suggest you to go for SSIS package.

  1. In the Data flow task, have XML source and read bytes from the xml into a variable of DT_Image datatype.
  2. Create a script task, which uploads the byte array (DT_Image) got in step no.1 to azure blob storage as mentioned in the below. Code slightly modified for the requirement. Reference of SO post
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;    

// Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse("StorageKey");

// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");

// Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob");

byte[] byteArrayIn = Dts.Variables["User::ImageVariable"].Value;

// Create or overwrite the "myblob" blob with contents from a local file.
using (var memoryStream = new MemoryStream(byteArrayIn);)
{
    blockBlob.UploadFromStream(memoryStream);
}
  1. Now, host this SSIS Package in SSIS Runtime in Azure Data Factory and execute the SSIS package.

SSIS Runtime in Azure DataFactory

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.