1

i have an html page

<tr>
<td rowspan="7">
<a href="http://www.link1.com/" style="text-decoration: none;">
        <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
        </a>
    </td>
    <td colspan="2" rowspan="2">
        <a href='http://www.link1.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>
<td colspan="2" rowspan="2">
        <a href='http://www.url.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>

I want to replace all url in href by mytest.com?url=$link

I try with :

    $messaget = preg_replace('/<a(.*)href="([^"]*)"(.*)>/','mytest.com?url=$2',$messaget);
3
  • 2
    PHP is server-side code... so I'm not sure what/how you're trying to accomplish your result. Commented Aug 29, 2013 at 16:05
  • Sorry ... My html code is in variable $messaget. Commented Aug 29, 2013 at 16:13
  • 1
    You should never use regex for dealing with HTML code, use an HTML parser instead. See stackoverflow.com/questions/3577641/… and simplehtmldom.sourceforge.net. Commented Aug 29, 2013 at 16:17

4 Answers 4

1

This may help you in the short run:

preg_replace('/<a (.*)href=[\'"]([^"]*)[\'"](.*)>/', '<a $1href="mytest.com?url=$2"$3>', $messaget);

In your regex you were using href="...", that is, double quotes, but in your HTML you have a mixture of both double and single quotes.

And in the replacement string you forgot to include $1 and $3.

That said, DO NOT use regex to parse HTML. The answer by @BenLanc below is better, use that instead. Read the link he posted.

Sign up to request clarification or add additional context in comments.

3 Comments

His regex works only for double quotes the issue is he needs to account for both double and single as per his supplied sample. So the regex you've provided still won't do the job. Though you did fix the replacement error
You're right. I didn't see earlier that he had also with double quotes, I saw only single quotes. Thanks, will fix now.
@janos It's cool, undid the downvote and deleted my comment when you updated your answer
1

Don't use regex on HTML, HTML is not regular.

Assuming your markup is valid (and if it's not, pass it through Tidy first), you should use xpath, to grab the elements and then update the href directly. For example:

<?php
$messaget = <<<XML
<tr>
  <td rowspan="7">
    <a href="http://www.link1.com/" style="text-decoration: none;">
      <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
    </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.link1.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.url.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
</tr>
XML;

$xml   = new SimpleXMLElement($messaget);

// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");

// Loop through the links and update the href, don't forget to url encode the original!
foreach($links as $link)
{
  $link["href"] = sprintf("mytest.com/?url=%s", urlencode($link['href']));
}

// Return your HTML with transformed hrefs!
$messaget = $xml->asXml();

Comments

0

Don't forget /m at the end of your regexp since your are using multiline source:

PHP Doc PCRE

Comments

0

Regex to match an url:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/  

More background info

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.