0

I'm trying to write a Splunk query, and I need to parse out the command line arguments given to a Windows program. Specifically, I'm trying to get the name of the package that is being installed. Here are some examples of the data:

/i "package\name" test
/i "package\name" "test"
/i "package\ name" test
/i "package\ name" "test"
/i package\name test
/package package\name "test"

The package name is always preceded by "/i" or "/package" (they can be upper or lower case) and a space (although sometimes there is no space). The package name is normally in quotes, but sometimes it isn't. If it's in quotes, it can contain spaces. It is usually followed by more command line arguments, sometimes in quotes and sometimes not, but I don't really care about those. They're represented by the string test/"test". I'm basically trying to get everything between the "i" (or package) and the command line arguments that comes after the package name.

I first tried using \/([iI]|(?i)package)\s?(?<package>.*?)\s to extract the package name into a capture group. But the problem was the third and fourth test strings due to the spaces within the quotes. They would cause everything after them to get cut off, so I'd only end up with "package" instead of "package name".

So I thought maybe I could use one regex to extract everything within quotes, another to extract everything with no quotes, and then combine them.

With the following regex, I can get "package\name" or "package\ name" from the first 4 of the above strings with no issue: \/([iI]|(?i)package)\s?"(?<package1>.*?)"

To get the last 2, I tried to get everything after i/package that didn't start with quotes: \/([iI]|(?i)package)\s?[^"](?<package2>.*?)\s

But, using regex101.com, it seems that matches the package name for all the test strings. And it cuts off the first character in the last 2, so I'd have "ackage\name". I'm not sure why either is happening.

If it's possible to extract what I want with one expression, that would be the preferred solution. But, being able to extract the package name from the last 2 test cases would also work. However, if this is the solution, there should be no overlap between the capture groups. package1 should match the package names in test strings 1-4, and package 2 should match 5-6.

UPDATE:

I appreciate everyone's answers. I got some help from a colleague which I was able to tweak into what I believe is a viable solution. I thought I'd share it in case anyone else found it helpful: (?i)(\/i)\s?(?:\"(?<package1>[^\"]*)\"|(?<package2>\S+))

4
  • Try (?Ji)\/(i|package)\s*(?:"(?<package1>.*?)"|(?<package1>\S+)) Commented Nov 13, 2023 at 19:02
  • No, it would not be possible. And (?i) is better left in the beginning in this case. Commented Nov 13, 2023 at 19:23
  • 1
    Please tell us exactly what you want the match from each example. Commented Nov 13, 2023 at 20:55
  • Based on your sample data ... this rex seems to work: \/.+?\\\W*(?<pkg_name>\w+) Commented Nov 16, 2023 at 19:02

3 Answers 3

0

I was able to parse the sample data using this regex. It uses conditional matching to decide if the package_name field should end with a quote or a space.

\/(?:i|package)\s*(\\\")?(?<package_name>(?(1)[^\"]+|\S+))(\1)?
Sign up to request clarification or add additional context in comments.

Comments

-1

Try the following capture pattern.  The value will be in group 2.

(?i)\/(?:i|package) ?(")?(.+?)(?(1)(?<!\\)"|\s)

Comments

-1

This regex101 example is case insensitive and uses one pattern for quoted and a different pattern non-quoted package names:

(?i)\/(?:i|package)(?:\s*"([^\\]+\\\s*.*?)"|\s+([^\\"]+\\.*?)\s)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.