0

I have a regular expression that breaks html into necessary for me peaces. I will not present the whole regex, because it's too long. In a nutshell, its a multi-line table cells row-by-row parser. Recently i've ran into a trouble: the layout of parsing pages has changed, so I started remastering the regex to fit new layout, but I've found that layout wrapping data I need in a particular cell in some rows may differ.

What do we have?

The layout of the cell may be like this or like this

which leads me to question: how do I capture needed data and do not have additional unnecessary group?

Conditions in regexps described here regular-expressions.info/conditional.html, I've read it but still don't have a clue.

2 Answers 2

2

This should help :)

<td class='(?:class1|class2)'>\s*((?=\w).*)\s*</td>
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your support and updating your answer accordingly :)
Thank you. Your regexp perfectly fits.
1

Edited: took over regexhacks expression, as it is a solution that is better.

Not sure, but maybe you are looking for non-capturing groups used as (?:). Thus you could do

<td class='class(?:1|2)'>\s*((?=\w).*)\s*</td>

Well, in this example you would not need the groups:

<td class='class[12]'>\s*((?=\w).*)\s*</td>

but in more complex cases you could use them.

See sample: rubular

But this might not be what you want. Could you give a more precise example of the problem?

1 Comment

Thank you for respond. I think now I starting understand how conditional regexps work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.