1

I am looking to create a user defined aggregate function in KQL to be used in a summarize function. I want to calculate the average value of a device sensor, but I need it to include the duration of time when a sensor has a specific value.

I've searched online, in the Microsoft documentation and StackOverflow, but I am unable to find any example that may help.

Note: The code examples in this question are slimmed down to just include the relevant information.

My table setup is as follows:

.create table tempMetrics (
    TimeStamp: datetime,
    Name: string,
    Value: decimal
    )

Let's use this as input data:

.ingest inline into table tempMetrics <|
datetime(2024-01-03 10:00:00),Sensor1,1
datetime(2024-01-03 10:03:00),Sensor1,5
datetime(2024-01-03 10:05:00),Sensor1,3
datetime(2024-01-03 10:06:00),Sensor1,7
datetime(2024-01-03 10:00:00),Sensor2,5
datetime(2024-01-03 10:02:00),Sensor2,6
datetime(2024-01-03 10:05:00),Sensor2,3
datetime(2024-01-03 10:09:00),Sensor2,1

I want to calculate the average value per five minutes, per metric. A simple query to get this would be:

tempMetrics
| summarize Average=avg(Value) by bin(TimeStamp, 5m), Name

However, I want it to be the average over time. So if a sensor measures 1 at the start of the time block, and three minutes in the value changes to 5, the average shouldn't be 3, but 2.6.

Above query gives the following results.

TimeStamp Name Average Should be
2024-01-03T10:00:00Z Sensor1 3 2.6
2024-01-03T10:00:00Z Sensor2 5.5 5.6
2024-01-03T10:05:00Z Sensor1 5 6.2
2024-01-03T10:05:00Z Sensor2 2 2.6

I want to solve this using a user defined aggregate function, because this would keep the query clean.

Does anyone have any experience with this?

3 Answers 3

0

I sugest you to use the range operator to create a time series for each sensor and then summarize with average over time and Name.

tempMetrics
| extend TimeSlot = range(bin(TimeStamp, 5m), bin(TimeStamp + 5m, 5m), 1m)
| summarize Average = avg(Value) by TimeSlot, Name

EDIT: This solution assumes that the time intervals are aligned, and it might need adjustments.

Sign up to request clarification or add additional context in comments.

1 Comment

This seems to create a series for each sensor, regardless of a value that may overlap. Unfortunately the intervals are not aligned. The devices generating this data will send it when a sensor value changes and it can be completely random.
0

I have had contact with a Microsoft Cloud Solution Architect, who is assisting us and he has confirmed that it is not possible to create a user defined aggregate function.

The query is to be used in a Materialized View, so serialization is not possible (order by, partition, etc.). For this reason I was looking into creating a user defined function. As this is not possible, I will look into other possibilities.

1 Comment

Okay. let us know the other possible answers to solve this.
0

Creating a user defined function for finding this average is not possible. You can directly create the logics with the existing functions in Kusto. Below is the code.

let tempMetrics = datatable(TimeStamp:datetime, Name:string, Value:decimal)
[
    datetime(2024-01-03 10:00:00), 'Sensor1', 1,
    datetime(2024-01-03 10:03:00), 'Sensor1', 5,
    datetime(2024-01-03 10:05:00), 'Sensor1', 3,
    datetime(2024-01-03 10:06:00), 'Sensor1', 7,
    datetime(2024-01-03 10:00:00), 'Sensor2', 5,
    datetime(2024-01-03 10:02:00), 'Sensor2', 6,
    datetime(2024-01-03 10:05:00), 'Sensor2', 3,
    datetime(2024-01-03 10:09:00), 'Sensor2', 1,
];
tempMetrics
| partition hint.strategy=native by Name
(
 make-series Value = max(Value) default=long(null) on TimeStamp from datetime("2024-01-03 10:00:00") to datetime("2024-01-03 10:10:00")   step 1m
 | project Name , Value, TimeStamp
)
|  mv-expand TimeStamp, Value=series_fill_forward(Value)
| extend TimeStamp=todatetime(TimeStamp), Value= tolong(Value)
| summarize Average=avg(Value) by bin(TimeStamp, 5m), Name
| where  isnan(Average)==0

This code uses the make-series operator to create a time series of the maximum value for each metric, and then uses mv-expand to expand and fill in any missing values. Finally, it uses summarize to calculate the average value per five minutes, per metric.

Output

TimeStamp Name Average
2024-01-03T10:00:00Z Sensor1 2.6
2024-01-03T10:00:00Z Sensor2 5.6
2024-01-03T10:05:00Z Sensor1 6.2
2024-01-03T10:05:00Z Sensor2 2.6

1 Comment

This may work, and there are a lot of ways to calculate the timed average. I want to store this in a Materialized View however and that will not be possible with a partition statement. I will further investigate and post on StackOverflow if any useful technique is found.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.