0

I've already found a lot of stackoverflow questions about this topic. But I cannot find out the solution out of these questions for my problem.

I have the following html:

<p><a name="first-title"></a></p>
<h3>First Title</h3>
<h2><a href='#second'>Second Title</a></h2>
<h3>Third Title</h3>

I want to find out the <h3> prepended by </a></p>. In this case, the output should be:

<h3>First Title</h3>

So I implement the following regular expression;

preg_match_all('/(?<=<\/a><\/p>)<h3>(.+?)<\/h3>/s',$html,$data);

The above regular expression cannot output anything from the above html. But if I remove the newlines from the html, the above regular expression can correctly output my desire result.

I would not like to remove newlines from the html if possible. How should I develop regular expression to ignore the newlines from the source string?

Please, help me.

1

1 Answer 1

4

Here comes the use of \K, since you can't use qunatifiers inside the lookaround assertions.

preg_match_all('/<\/a><\/p>\s*\K<h3>(.+?)<\/h3>/s',$html,$data);

or just put \n char inside the lookbehind.

preg_match_all('/(?<=<\/a><\/p>\n)<h3>(.+?)<\/h3>/s',$html,$data);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.