Kusto KQL: how to check if JSON array in dataset contains element of another array?

Question

The dataset (table) I'm querying has a column containing a JSON string array.
I have a fixed list of verbs which I need to check against each entry in the table and find those, where at least one of the items in the JSON list starts with one of the verbs from the fixed list.

// Verbs to look for (actual list is longer).
let verbs = datatable (verb : string) [
"discover",
"gain"
];

// Data. Second column is a JSON string.
let data = datatable(id : int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
];

// Query: I need to know if at least one of the items in the "json" column starts
// with one of the verbs of the "verbs" list.
data
| extend  parsedJson = parse_json(json)
| extend OneOrMoreListItemsHaveVerb = false
| project id, OneOrMoreListItemsHaveVerb

I tried to use mv_apply() but failed because I'm dealing with two lists/arrays compared against each other, not one array and one item.

For the example data above, I expect items with IDs 1 and 3 to be returned. The first element of item 1 has "discover" and the 2nd element of item 3 starts with "gain".

Yoni L. · Accepted Answer · 2022-05-23 17:07:05Z

2

you could create an array from your input table (e.g. using summarize make_set()), then loop over it using mv-apply foreach of the inputs.

for example:

let verbs = datatable (verb: string) [
    "discover", "gain"
]
;
let verbs_list = toscalar(verbs | summarize make_set(verb))
;
let data = datatable(id: int, json: string) [
    1, "[\"Discover me\", \"some text\"]",
    2, "[\"All good\", \"no invalid verbs\"]",
    3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
data
| mv-apply verb = verbs_list on (
    mv-apply input = parse_json(json) on (
        where input startswith verb
    )
)
| project ['id'], json

id	json
1	["Discover me", "some text"]
3	["first element fine", "gain power isn't ok"]

alternatively, you can implement similar logic using the partition operator:

let verbs = datatable (verb: string) [
    "discover", "gain"
]
;
let data = datatable(id: int, json: string) [
    1, "[\"Discover me\", \"some text\"]",
    2, "[\"All good\", \"no invalid verbs\"]",
    3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
verbs
| partition by verb
{
    data
    | mv-apply input = parse_json(json) on(
        where input startswith verb
    )
    | project ['id'], json
}

id	json
1	["Discover me", "some text"]
3	["first element fine", "gain power isn't ok"]

answered May 23, 2022 at 17:07

Yoni L.

26.5k3 gold badges46 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Krumelur Over a year ago

Oh, I see. Using nested mv-apply did not occur to me. I'll try this (my) tomorrow.

Krumelur Over a year ago

In your first approach, what is the toscalar() call for and why are you using make_set() and not make_list()?

Yoni L. Over a year ago

"make_set()" is used in order to eliminate any potential duplicates in your source data; "toscalar()" is used in order to make the list a scalar dynamic-typed argument that "mv-apply" can accept.

Collectives™ on Stack Overflow

Kusto KQL: how to check if JSON array in dataset contains element of another array?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related