I have a table with following columns:
- ID: identifies an imported document (think filename). This is unique for a combination of ImportId and ImportTime.
- SomeData: some data column. In the real table there are more columns.
- ImportId: an ID in the format of YYYY-MM-DD, eg "2022-05-14" (this is a string column)
- ImportTime: the date and time the import was done (this is a string column)
The RowNum is NOT part of the table but used here to be able to reference records/rows.
| RowNum | ID | Value | ImportId | ImportTime |
|---|---|---|---|---|
| 1 | A | Doc A content as of May 11, 2022 | 2022-05-11 | 2022-05-11 13:00 |
| 2 | B | Doc B content as of May 11, 2022 | 2022-05-11 | 2022-05-11 13:00 |
| 3 | A | Doc A content as of May 11, 2022 | 2022-05-11 | 2022-05-11 17:00 |
| 4 | B | Doc B content as of May 11, 2022 | 2022-05-11 | 2022-05-11 17:00 |
| 5 | A | Doc A content as of May 14, 2022 | 2022-05-14 | 2022-05-17 08:00 |
| 6 | B | Doc B content as of May 14, 2022 | 2022-05-14 | 2022-05-17 08:00 |
| 7 | A | Doc A content as of May 14, 2022 | 2022-05-14 | 2022-05-17 10:00 |
| 8 | B | Doc B content as of May 14, 2022 | 2022-05-14 | 2022-05-17 10:00 |
| 9 | A | Doc A content as of May 11, 2022 | 2022-05-11 | 2022-05-18 15:00 |
| 10 | B | Doc B content as of May 11, 2022 | 2022-05-11 | 2022-05-18 15:00 |
- In the table above there were three imports for May 11 (ImportId = "2022-05-11") and two imports for data from May 14 (ImportId = "2022-05-14").
- The latest import run (ImportTime) was at 2022-05-18 15:00
- The latest ImportTime does not necessarily correlate with the latest import data. In my example above, someone ran an import on May 18 at 15:00 but imported the state of the catalog as it was on May 11 (ImportId = "2022-05-11").
Challenge: I need to get the records with the newest ImportId (which would be "2022-05-14") and the latest ImportTime (which would be "2022-05-18 15:00").
For the example above, the result should contain the two rows with ImportId "2022-05-14" and ImportTime "2022-05-17 10:00" (row numbers 7 and 8).
What I tried:
Approach 1
I used arg_max() on ImportTime:
T
| summarize arg_max(ImportTime, *) by ID
This returns the last two rows (9 and 10), where ImportId is "2022-05-11". That's not what I'm after because the newest ImportId is "2022-05-14".
Approach 2
If I use arg_max(ImportId, *) by ID instead, I am getting the ones for "2022-05-14" (rows 5 and 6), but not the ones with the latest ImportTime.
Approach 3
I combined ImportTime and ImportId into an extended column and applied arg_max() on that. This seems to work but I'm unsure if it's correct in all cases?
T
| extend Combined = strcat(ImportId, ImportTime)
| summarize arg_max(Combined, *) by ID
This returns the expected rows 7 and 8 for "2022-05-14" at the import time of "2022-05-17 10:00".
Are there better options?
ImportTimevalues forImportIdwith value "2022-05-11"? Is "import" by this definition is a combination ofImportTimeandImportIdregardless ofValue? By the use ofValuein your code it seems thatValueis the ID of an import, and not just "some content".IDis still not clear in the context of your desired results. IfRowNum6 & 8 would have been removed from the data sample, what would be the desired result then?RowNum7? 7 & 10? Other?