Php: How to ignore newline in Regex

Question

I've already found a lot of stackoverflow questions about this topic. But I cannot find out the solution out of these questions for my problem.

I have the following html:

<p><a name="first-title"></a></p>
<h3>First Title</h3>
<h2><a href='#second'>Second Title</a></h2>
<h3>Third Title</h3>

I want to find out the <h3> prepended by </a></p>. In this case, the output should be:

<h3>First Title</h3>

So I implement the following regular expression;

preg_match_all('/(?<=<\/a><\/p>)<h3>(.+?)<\/h3>/s',$html,$data);

The above regular expression cannot output anything from the above html. But if I remove the newlines from the html, the above regular expression can correctly output my desire result.

I would not like to remove newlines from the html if possible. How should I develop regular expression to ignore the newlines from the source string?

Please, help me.

Read this stackoverflow.com/questions/1732348/…. Regexes are NOT the way to parse HTML — Jojodmo
– Jojodmo, Commented Jun 28, 2015 at 22:00

Avinash Raj · Accepted Answer · 2015-06-28 16:49:02Z

4

Here comes the use of \K, since you can't use qunatifiers inside the lookaround assertions.

preg_match_all('/<\/a><\/p>\s*\K<h3>(.+?)<\/h3>/s',$html,$data);

or just put \n char inside the lookbehind.

preg_match_all('/(?<=<\/a><\/p>\n)<h3>(.+?)<\/h3>/s',$html,$data);

edited Jun 28, 2015 at 16:49

answered Jun 28, 2015 at 16:45

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Php: How to ignore newline in Regex

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related