1

My timeseries table T has the columns: location, sensor_id, timestamp and value. The table has thousands of sensor_id, billions of values per year and about 100 locations. The 100 locations are in 6 countries.

For table T there exist about 10 user groups and each user group are only allowed to see locations from particular countries. For example:

  • user group 1 can only see locations in Spain and France
  • user group 2 can only see locations in Spain and Germany

To ensure that we are data compliant we want to use row level security. One approach is to hard code the locations in for each user group:

.create-or-alter function  My_RLS_function(TableName:string) {
    table(TableName)
    | where (current_principal_is_member_of('aadgroup=user_group_1') and location in ('Location1, Location3, ..., location 50')
      or ...
      or ...
}

We think that this is the most query optimized way, however typing in each location would be messy and error prone. It would also be cumbersome to add or remove a location.

Therefore, our approach is to create a new table called location_and_countries. This is a table of
100 rows where the first column is country, and the second column is location. Assuming user group 1 can see locations in Spain and France the RLS becomes:

.create-or-alter function  My_RLS_function(TableName:string) {
    table(TableName)
    | where (current_principal_is_member_of('aadgroup=user_group_1') and station in (toscalar(location_and_countries| where countries in ('Spain', 'France') | summarize make_list(location)))
      or ...
      or ...
}

What we are wondering about now is the following:

  1. Does anyone know a more query optimized way to store and retrieve metadata to be used in KQL queries?
  2. Is there a way we can hide the table location_and_countries so that the user cannot query it?
2
  • 1
    There are a couple of ways to store and retrieve metadata to be used in KQL queries. One way is to use Azure Key Vault to store the metadata and retrieve it using the Key Vault REST API. Another way is to use Azure App Configuration to store the metadata and retrieve it using the App Configuration REST API. Both of these methods are query optimized and secure. As for hiding the table location_and_countries so that the user cannot query it, you can use Azure RBAC to restrict access to the table. Commented May 2, 2024 at 9:15
  • 1
    You can assign the appropriate RBAC roles to the users who need access to the table and deny access to all other users. Commented May 2, 2024 at 9:15

1 Answer 1

0

I had a similar challenge, millions of sensor data rows from thousands of systems. And each user must only see data that belongs to systems in his country. Here's how I solved it.

The country users are getting an active directory group, eg. group_DE, group_US, group_CN etc. Now the cornerstone is to use the function table() to include my lookup table countries-systems dynamically. Here's the full code:

    .create-or-alter function with (folder = "meta/security", docstring = "Enable row level seucrity", skipvalidation = "true") rlsexample(tablename:string) {  
let country_de = current_principal_is_member_of('aadgroup=user_de'); 
let country_us = current_principal_is_member_of('aadgroup=user_us'); 
let country_cn = current_principal_is_member_of('aadgroup=user_cn'); 
//other countries...
let isDatascientist =  
current_principal_is_member_of('aadgroup=user_datascientist;5eejkfhgdjfhg3') ;
let sys_de  = countrytable | where country_id == "DE" | project  systemid;
let sys_us = countrytable | where country_id == "US" | project systemid;
let sys_cn  = countrytable | where country_id == "CN" | project systemid;
let basetable = table(tablename)   ;
let countrygroup_de = basetable   | where country_de | where systemid in (sys_de);
let countrygroup_us = basetable   | where country_us | where systemid in (sys_us);
let countrygroup_cn = basetable   | where country_cn | where  systemid in (sys_cn);
let admin = basetable   | where isDatascientist ; 
union countrygroup_de, countrygroup_us, countrygroup_cn, , admin
} 

Let's have a closer look what happens:

  1. the input table is our sensor table ("tablename")
  2. we check the users permission ("country_de,...)
  3. we query the country table
  4. we query the sensor table for each country. Note that the country table and tablename MUST have a unique systemid
  5. we are filtering sensor values for those systems that are located in one country
  6. finally we do a union on all the generated tables

My assumption is that the user can have multiple countries assigned, but the system that is sending sensor data has to be located in a unique country. Then, the union will return exactely one of the country groups.

Please have a look at the database execution path. It is really important to now that Kusto will not really query billions of sensor values over and over again. Instead, the engine will know that it just has to put only one of the filtered sensor values to memory. So you are effectively querying only the data you need, not more.

Finally put the function rlsexample as a row level security policy on your sensor data table or view

.alter materialized-view mysensorvalues policy row_level_security enable "rlsexample(mysensorvalues)"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.