I need help to parse some information from a mass of text, basically I am importing a PSD file and want to parse some data from it.
Amongst the text are strings such as this:
\r\nj78876 RANDOM TEXT STRINGS 75 £
Now what I want to do is grab all strings that fit this format (maybe the starting "\r\n" and ending "£" can be delimiters) and get the code at the start (j78876) and the price at the end (75). Note price may be more digits that 2.
I want to then grab the code such as j78876 and the price for each string like this which is found as they will occur many times (different codes and prices).
Can anyone suggest a way to do this?
I am not very proficient with Regex so guidance would be great.
thanks.
Note: Here is a snipped of the actual text (there is a lot more in the actual file).
Référence Ancienne référence 3Com/H3C Libellé Remarque Prix en €\r\nJ9449A HP V1810-8G Switch 139,00\r\nJ9450A HP V1810-24G Switch 359,00\r\nEdge Switches - Managed \r\nHP Layer 2 Switches - Managed Stackables and Chassis\r\nHP Switch 2510 Series\r\nRéférence Ancienne référence 3Com/H3C Libellé Remarque Prix en €\r\nJ9019B HP E2510-24 Switch 359,00\r \nJ9020A HP E2510-48 Switch 599,00\r\nJ9279A HP E2510-24G Switch 779,00\r\nJ9280A HP E2510-48G Switch 1 569,00\r\nHP Switch 2520 Series\r\nRéférence Ancienne référence 3Com/H3C Libellé Remarque Prix en €\r\nJ9137A HP E2520-8-PoE Switch 489,00\r\nJ9138A HP E2520-24-PoE Switch 779,00\r\nJ9298A HP E2520-8G-PoE Switch 749,00\r\nJ9299A HP E2520- 24G-PoE Switch 1 569,00\r\nHP Layer 2 and 3 Switches - Managed Stackables and Chassis\r \nThe RBP is a recommended price only. \r\nHP Switch 2600 Series\r\nRéférence Ancienne
Update I found this:
[\\r\\n](\w\d+\w).*?(\d+,\d\d)[\\r\\n]
Worked for me in regex browser testers but will not work in my C# code
Regex reg = new Regex(@"[\\r\\n](\w\d+\w).*?(\d+,\d\d)[\\r\\n]", RegexOptions.IgnoreCase);
Match matched = reg.Match(str);
if (matched.Success)
{
string code = matched.Groups[1].Value;
string currencyAmt = matched.Groups[2].Value;
}
Final Update: In the browser testers i had to double escape the \r\n - in my code it was not necessary. Then to loop the groups I used the looping answer.
foreach (Match match in Regex.Matches(content, @"[\r\n](?<code>\w\d+\w).*?(?<price>\d+,\d\d)[\r\n]", RegexOptions.IgnoreCase))
{
string code = match.Groups["code"].Value;
string currencyAmt = match.Groups["price"].Value;
}