0

The following Stream Analytics Query counts the number of events, grouped by IP Address, in 10-second Sliding Window intervals:

Select
    Min(Time) as FirstHit,
    Max(Time) as LastHit, 
    Count(*) as Total, 
    IPAddress
From
    Input Partition By PartitionId TimeStamp By Time
Group By 
    SlidingWindow(second, 10), IPAddress, PartitionId
Having
    Total >= 10

The resulting aggregation is output to an Event Hub.

The following JSON payload of 10 simple objects, spaced precisely 1-second apart, within a 10-second window is published to Stream Analytics, to be processed by the above Query:

[{
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:01"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:02"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:03"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:04"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:05"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:06"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:07"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:08"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:09"
}, {
    "IPAddress": "192.168.0.10",
    "Time": "2016-09-02T11:40:10"
}]

Both Stream Analytics job and Event Hub instance are newly-instantiated.

Stream Analytics does not output a corresponding event, despite the fact that the event-payload conforms to the Query criteria.

However, upon issuing a second request, publishing the same payload to Stream Analytics, an output event is created, with the correct metadata.

Is there a discrepancy in my configuration, or some sort of boot-strapper/warm-up/offset feature of Stream Analytics that results in the first payload being effectively ignored?

1 Answer 1

1

My guess is that you are interacting with it in the following order:

  1. Events are sent to the hub.
  2. StreamAnalytics job is started from the current time.
  3. The other events are sent to the hub.

Being EventHub a stream, it doesn't have the concept of current time, StreamAnalytics needs the correct offset to start processing events.

If you give us more details we can confirm this or investigate the problem further.

As from my comment, from the input blade of StreamAnalytics in the portal, you can sample the data from the input and feed it to your query to make sure the result you are expecting are there (the time window will be ignored, but you can just sample the time window you want from the input first).

Also, as you are specifying the TIMESTAMP BY clause, make sure your input is configured with tolerance for out of order events in case your order is not guaranteed ( https://msdn.microsoft.com/en-us/library/azure/mt674682.aspx ).

More details on the late arrival/out of order settings for the input: https://blogs.msdn.microsoft.com/streamanalytics/2015/05/17/out-of-order-events/

Sign up to request clarification or add additional context in comments.

6 Comments

Stream Analytics job is started from the current time. all events sent to Event Hub are initialised with a Time property set to a value between the Stream Analytics start-time and the current time. Let me know what detail you need and I'll happily provide.
@PaulMooney I'm a bit busy now, but have a look at this: stackoverflow.com/questions/30646588/… I'm not sure you can override the time of the message. One think to try is just to start the job from an older date and see if you get any output.
Another way to diagnose is to sample the data of the input from the portal and process it in the test blade of the StreamAnalytics query and see the output to make sure the messages you are expecting match the query.
In C# the system property EnqueuedTimeUtc doesn't have a public setter; are you hacking it settings the SystemProperties? (Not sure about other languages, but they probably reflect this nature). Even changing it, not sure it's supported to change date on messages.
I'm leveraging a "TimeStamp by Time" clause in the Stream Analytics Query, where "Time" refers to the Time property in each JSON event. I'm setting this manually in ISO 8601 format. Not changing the EnqueuedTimeUtc property. Running the data against the test blade yields the correct results.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.