EDIT: To explain my motivation for this, I'm writing a command-line utility that takes a log file and a pattern (a non-regex string that indicates what each log entry looks like), converts the pattern into regex, and matches each line of the file with the regex, producing a collection of log events, which are then output in another format (e.g., JSON). I can't assume what the input pattern will be or what the file contains.
I'd like to parse a CSV list of key-value pairs. I need to capture the individual keys and values from the list. An example input string:
07/04/2012 <DEBUG> a=1, b=foo, c=bar : hello world!\n
I verified that the regex below correctly extracts the keys and values from input:
// regex
(([^,\s=]+)=([^,\s=]+)(?:,\s*(?:[^,\s=]+)=(?:[^,\s=]+))*?)
// input string
a=1, b=foo, c=bar
The result is:
// 1st call
group(1) == "a"
group(2) == "1"
// 2nd call
group(1) == "b"
group(2) == "foo"
// 3rd call
group(1) == "c"
group(2) == "bar"
But this regex (same as regex above with extra "stuff") does not work as expected:
// regex
\d{2}/\d{2}/\d{4} <DEBUG> (([^,\s=]+)=([^,\s=]+)(?:,\s*(?:[^,\s=]+)=(?:[^,\s=]+))*?) : .*
// input string
07/04/2012 <DEBUG> a=1, b=foo, c=bar : hello world!
For some reason, the result is:
group(1) == "a=1, b=foo, c=bar"
group(2) == "a"
group(3) == "1"
// no more matches
What's the correct Java regex to extract the keys and values?