2

Say I have a filename string, something like:

test_ABC_19000101_010101.987.txt,

Where "test" could be any combination of white space, characters, numbers, etc. I wish to extract the 19000101_010101 part (date and time) with Powershell. Currently I am assigning -split "_ABC_" to a variable and taking the second element of the array. I am then splitting this string subsequent times. Is there a way to accomplish this in one go?

PS

"_ABC_" is constant, occurring unchanged in all instances of filename(s).

2
  • Is _ABC_ always constant and the time always end with . ? Commented Dec 20, 2021 at 22:06
  • @SantiagoSquarzon, Correct. The "test" part is the messy bit. "19000101_010101.987" = Jan 01, 1901 01:01:01 and 987 milliseconds. Commented Dec 20, 2021 at 22:10

3 Answers 3

3

A more concise - albeit perhaps more obscure - alternative to Santiago Squarzon's helpful answer:

# Construct a regex that consumes the entire file name while
# using capture groups for the parts of interest.
$re = '.+_ABC_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.(\d{3})\..+'

[datetime] (
  # In the replacement string, use $1, $2, ... to refer to what the
  # first, second, ... capture group captured.
  'test_ABC_19000101_010101.987.txt' -replace $re, '$1-$2-$3T$4:$5:$6.$7'
)

Output:

Monday, January 1, 1900 1:01:01 AM

The -replace operation results in string '1900-01-01T01:01:01.987', which is a (culture-invariant) format that you can use as-is with a [datetime] cast.

Note that with a Get-ChildItem call as input you could slightly simplify the regex by providing $_.BaseName rather than $_.Name as the -replace LHS, which obviates the need to also match the extension (.\.+) in the regex.


An aside re the [datetime] cast: [datetime] '...' results in a [datetime] instance that is an unspecified timestamp (its .Kind property value is Unspecified), i.e. it is undefined whether it represents as Local or a Utc timestamp.

To get a Local timestamp, use
[datetime]::Parse('...', [cultureinfo]::InvariantCulture, 'AssumeLocal')
(use 'AssumeLocal, AdjustToUniversal' to get a Utc timestamp).

Alternatively, you can cast to [datetimeoffset] - a type that is generally preferable to [datetime] - which interprets a string cast to it as local by default. (You can then access its .LocalDateTime / .UtcDateTime properties to get Local / Utc [datetime] instances).

Sign up to request clarification or add additional context in comments.

2 Comments

That last bit ($_.BaseName) is good stuff! ISO8601 is good for invariance and interacting with other APIs, for sure, but no need for that in my simple application. To wit, I'm constantly trying to convince clients/coworkers that working in local time is a really bad idea... I like the self-describing nature of @Santiago's answer.
@osprey, I've folded my previous (since-deleted) comment re the .Kind property of the resulting [datetime] instance into the answer, along with an alternative using [datetimeoffset]. (And, to reiterate: agreed re @Santiago's answer).
2

This regex seems an overkill but I think it should work, as long as _ABC_ is constant and there is a _ to separate the date from the time and a . to separate time from milliseconds:

$re = [regex]'(?<=_ABC_)(?<date>\d*)_(?<time>\d*)\.(?<millisec>\d*)(?=\.)'

@'
test_ABC_19000101_010101.987.txt
t' az@ 0est_ABC_20000101_090101.123.txt
tes8as712t_ABC_21000101_080101.456.txt
te098d $st_ABC_22000101_070101.789.txt
[test]_ABC_23000101_060101.101.txt
t?\est_ABC_24000101_050101.112.txt
'@ -split '\r?\n' | ForEach-Object {

    $groups = $re.Match($_).Groups
    $date = $groups['date']
    $time = $groups['time']
    $msec = $groups['millisec']

    [datetime]::ParseExact(
        "$date $time $msec",
        "yyyyMMdd HHmmss fff",
        [cultureinfo]::InvariantCulture
    )
}

See https://regex101.com/r/8oSpqf/1 for details.

2 Comments

Good quickness! Bonus points for presuming (correctly) that I'd be parsing a list of filenames returned with Get-ChildItem. This is really helpful for me in learning to incorporate regex into my PS usage. Is -split '\r?\n' for line end compatibility?
@osprey that's right, \r?\n is friendly for Win and Linux users :) happy you to help, I too am learning regex hehe
2

If there will never be multiple sequences in the filename that appear as the timestamp (8 digits, _, 6 digits, then you could match on that pattern of digits.

PS C:\> 'test_ABC_19000101_010101.987.txt' -match '^.*ABC_(\d{8}_\d{6})\..*'
True
PS C:\> $Matches

Name                           Value
----                           -----
1                              19000101_010101
0                              test_ABC_19000101_010101.987.txt

PS C:\> $Matches[1]
19000101_010101

You would use the filename instead of the explicit string.

If you want to get a [System.DateTime] from it:

PS C:\> [datetime]::ParseExact($Matches[1], 'yyyyMMdd_HHmmss', $null)

Monday, January 1, 1900 01:01:01

2 Comments

No, can't use this as I'm trying to determine time from the filename. Each timestamp is unique and non-repeating.
Yes, system time is what I had been using, but proved problematic as it was not always exactly matching the time of file creation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.