0

I have a set of strings gathered from logs that I'm trying to parse into unique entries:

function Scan ($path, $logPaths, $pattern) 
{
$logPaths | % `
{ 
    $file = $_.FullName
    Write-Host "`n[$file]"
    Get-Content $file | Select-String -Pattern $pattern -CaseSensitive - AllMatches | % `
    {   
        $regexDateTime = New-Object System.Text.RegularExpressions.Regex "((?:\d{4})-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(,\d{3})?)"
        $matchDate = $regexDateTime.match($_)
        if($matchDate.success)              
        {
            $loglinedate = [System.DateTime]::ParseExact($matchDate, "yyyy-MM-dd HH:mm:ss,FFF", [System.Globalization.CultureInfo]::InvariantCulture)
            if ($loglinedate -gt $laterThan)
            {                   
                $date = $($_.toString().TrimStart() -split ']')[0]
                $message = $($_.toString().TrimStart() -split ']')[1]
                $messageArr += ,$date,$message  
            }                                           
        }
    }
    $messageArr | sort $message -Unique | foreach { Write-Host -f Green $date$message}
}
}

So for this input:


2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.

2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.

2015-09-04 07:50:16 [20] WARN Brighter - The message abc123 has been marked as obsolete by the consumer as the entity has a higher version on the consumer side.


Only the second two entries should be returned

I'm having trouble filtering out duplicates of $message: currently all entries are being returned (sort -Unique is not behaving as I would expect it to). I also need the correct $date to be returned against the filtered $message.

I'm pretty stuck with this, can anyone help?

1
  • I'm having difficulty making sense of the text formatting you are using since I can't see the input. As a general suggestion, try piping to Group-Object. It will automatically group unique values and give you the total count of each. Commented Sep 15, 2015 at 22:37

2 Answers 2

1

We can do what you want, but first let's backup just a little bit to help us do this better. Right now you have an array of arrays, and that's difficult to work with in general. What would be better is if you had an array of objects, and those objects had properties such as Date and Message. Let's start there.

        if ($loglinedate -gt $laterThan)
        {                   
            $date = $($_.toString().TrimStart() -split ']')[0]
            $message = $($_.toString().TrimStart() -split ']')[1]
            $messageArr += ,$date,$message  
        }                                           

is going to become...

        if ($loglinedate -gt $laterThan)
        {                   
            [Array]$messageArr += [PSCustomObject]@{
                'date' = $($_.toString().TrimStart() -split ']')[0]
                'message' = $($_.toString().TrimStart() -split ']')[1]
            }
        }                                           

That produces an array of objects, and each object has two properties, Date and Message. That will be much easier to work with.

If you only want the latest version of any message that's easily done with the Group-Object command as such:

$FilteredArr = $messageArr | Group Message | ForEach{$_.Group|sort Date|Select -Last 1}

Then if you want to display it to screen like you are, you could do:

$Filtered|ForEach{Write-Host -f Green ("{0}`t{1}" -f $_.Date, $_.Message)}
Sign up to request clarification or add additional context in comments.

1 Comment

Works perfectly, and excellent explanation. Thanks very much.
0

My take (not tested) :

function Scan ($path, $logPaths, $pattern) 
{

$regex = '(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(.+)'
$ht = @{}

$logPaths | % `
   { 
     $file = $_.FullName
     Write-Host "`n[$file]"
     Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % `
      {   
       if ($_.line -match $regex -and $ht[$matches[2]] -gt $matches[1])
         { $ht[$matches[2]] = $matches[1] }
      }  
    $ht.GetEnumerator() |
     sort Value |
     foreach { Write-Host -f Green "$($_.Value)$($_.Name)" }
  }
}

This splits the file at the timestamp, and loads the parts into a hash table, using the error message as the key and the timestamp as the data (this will de-dupe the messages in-stream).

The timestamps are already in string-sortable format (yyyy-MM-dd HH:mm:ss), so there's really no need to cast them to [datetime] to find the latest one. Just do a straight string compare, and if the incoming timestamp is greater than an existing value for that message, replace the existing value with the new one.

When you're done, you should have a hash table with a key for each unique message found, having a value of the latest timestamp found for that message.

2 Comments

Thanks for your reply. I should have posted the full code, the date parsing is in order to use a $laterthan variable to only select dates -gt than $laterthan. I tried the code below and get 'Index operation failed; the array index evaluated to null.' Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % ` { if ($_.line -match '].*' -and $ht[$matches[2]] -gt $matches[1]) { $ht[$matches[2]] = $matches[1] } }
It won't work with if ($_.line -match '].*. You need to use a regex with capture groups (like the one in the example). As far as the date parsing, just make $laterthan a string value in the same date format - e.g. '2015-01-01 00:00:00' and it will work fine. Try it yourself.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.