1

I'm currently trying to get everything between

<p><strong>http://website2.website.com/</strong><br />

and

<p><strong>

</strong></p>
<p><strong>http://website2.website.com/</strong><br />
<strong>1234123:12rwe</strong><br />
<strong>ewqwe:rjbvm225</strong><br />
<strong>mel35:zzrg</strong><br />
<p><strong>

This is what I have so far

MatchCollection RealMe = Regex.Matches(html, @"website2(.*)<p><strong>", RegexOptions.Singleline);
            foreach (Match combo in RealMe)
            {
                if (!combot.Value.Contains("xhtml"))
                {
                    listboxRealMe.Items.Add(combo);
                }
            }

This issue with this is that the listboxRealMe adds everything into one line, instead of creating a new item for each line

9
  • Why don't you just use CSQuery for this? It'll be much more robust. Commented Apr 29, 2015 at 20:51
  • Does website2(.*?)<p><strong> improve the situation? Commented Apr 29, 2015 at 20:51
  • @LucasTrzesniewski never heard of CSQuery, I'm checking it out right now Commented Apr 29, 2015 at 20:54
  • Ok so by what you are saying, one match is everything in the middle there. So you took your one match and added it as one listbox item. A multi-line textbox probably shows them on different lines. A listbox I would expect to see render each item you add on a line regardless of new line characters. Commented Apr 29, 2015 at 20:56
  • @stribizhev Thanks for that! It let me get exactly in between what I was looking for, but it still gets added onto the listbox as 1 item Commented Apr 29, 2015 at 20:56

1 Answer 1

1

Try using:

string resultString = null;
try {
    resultString = Regex.Match(subjectString, @"\.com/</strong><br />(.*)<p><strong>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline).Value;
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Regex Explanation:

\.com/</strong><br />(.*)<p><strong>

Options: Case insensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Numbered capture

Match the character “.” literally «\.»
Match the character string “com/</strong><br />” literally «com/</strong><br />»
Match the regex below and capture its match into backreference number 1 «(.*)»
   Match any single character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “<p><strong>” literally «<p><strong>»
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.