9

I'm using a service which outputs to an Event Hub.

We want to store that output, to be read once per day by a batch job running on Apache Spark. Basically we figured, just get all messages dumped to blobs.

What's the easiest way to capture messages from an Event Hub to Blob Storage?

Our first thought was a Streaming Analytics job, but it demands to parse the raw message (CSV/JSON/Avro), our current format is none of those.


Update We solved this problem by changing our message format. I'd still like to know if there's any low-impact way to store messages to blobs. Did EventHub have a solution for this before Streaming Analytics arrived?

4
  • 1
    If your Event Hub serialization format isn't CSV/JSON/Avro then what is it? Commented Aug 18, 2015 at 12:48
  • @GregGalloway - In fact it's JSON, but prefixed with a C# interface name. Our C# code sniffs that to know what type to deserialize it into. Commented Aug 18, 2015 at 23:50
  • 1
    Have you seen this link? I don't have all the answers on how to automate this to run daily or the best way to parse JSON in Spark but this seems like a good starting point for research and maybe others can comment: azure.microsoft.com/en-us/documentation/articles/… Commented Aug 19, 2015 at 1:43
  • (Sorry keep hitting Enter accidentally :) Cheers. I think we need a long-term record of all the data coming in. ... We could have Spark Streaming receiving it and immediately writing it out. But it seems overkill even more than the Streaming Analytics version already is. Commented Aug 19, 2015 at 1:57

4 Answers 4

6

You could write your own worker process to read the messages off EventHub and store them to blob storage. You do not need to do this real time as messages on EH remain for the set retention days. The client that reads the EH is responsible for managing what messages have been processed by keeping track of the EH message partitionid and offset. There is a C# library that makes this extremely easy and scales really well: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/

Sign up to request clarification or add additional context in comments.

Comments

3

You can use event-hubs-capture to capture to a blob.

1 Comment

Oh OK - they renamed Event Hubs Archive when they GA'd it, now it's Event Hubs Capture. Thanks.
3

You can also do this via an Azure Function (serverless code) which fires from an Event Hub trigger.

Depending on your requirement, this can work better than the Event Capture feature if you need a capability that it doesn't have, like saving as GZIP, or writing to a more custom blob virtual directory structure.

https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs#trigger-usage

Comments

1

Azure now has this built-in: Event Hubs Archive (in preview)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.