0

Possible Duplicate:
Regular expression for parsing links from a webpage?
RegEx match open tags except XHTML self-contained tags

i need a regular expression to strip html <a> tags , here is sample:

<a href="xxxx" class="yyy" title="zzz" ...> link </a>

should be converted to

 link
3
  • 5
    Do you 'need' a regular expression? Commented Sep 23, 2011 at 16:46
  • @josh3736 I will feast on your Unicorn's blood. Commented Sep 26, 2011 at 23:18
  • In what language? HTML doesn't have regular expressions. Commented Sep 29, 2011 at 1:53

5 Answers 5

13

I think you're looking for: </?a(|\s+[^>]+)>

Sign up to request clarification or add additional context in comments.

3 Comments

When have you ever seen just an <a> tag?
I edited it to account for strange cases like that anyway.
Doesn't match < a> or < /a>.
3

Answers given above would match valid html tags such as <abbr> or <address> or <applet> and strip them out erroneously. A better regex to match only anchor tags would be

</?a(?:(?= )[^>]*)?>

1 Comment

I've used this one with the free edition of sublime text 3. Worked best in my case.
2

Here's what I would use:

</?a\b[^>]*>

Comments

2

You're going to have to use this hackish solution iteratively, and it won't probably even work perfectly for complicated HTML:

<a(\s[^>]*)?>.*?(</a>)?

Alternatively, you can try one of the existing HTML sanitizers/parsers out there.


HTML is not a regular language; any regex we give you will not be 'correct'. It's impossible. Even Jon Skeet and Chuck Norris can't do it. Before I lapse into a fit of rage, like @bobince [in]famously once did, I'll just say this:

Use a HTML Parser.

(Whatever they're called.)


EDIT:

If you want to 'incorrectly' strip out </a>s that don't have any <a>s as well, do this:

</?[a\s]*[^>]*>

4 Comments

Your regex: <a(\s[^>]*)?>(</a>)? does not match </a> closing tags (except for the case where the A element is empty).
@ridgerunner Since regexes don't have memory, putting a .*? in between the two is the best I can do. It'll break down for more complicated HTML.
Just curious: Why are you worried about the tag's text at all?
@BillCriswell Oh, damn, I just realized the OP probably doesn't need a 'regex' which will not strip out unmatched </a>s. (That would be incorrect, but I don't think the OP would care. :))
1

</?a.*?> would work. Replace it with ''

9 Comments

i just make a little change that works for me. thanks for help. /<a.*?>/ , edit your answer.
Yes of course, I merely gave the RE. You would have to append the / prefix/suffix if you were using javascript for instance. You would not have to add anything if you were using the C# regex library.
but there is a little problem. the &lt;/a> not striped .
are you using POSIX or PCRE? i.e ereg_replace or preg_replace
FYI: This also matches elements like <abbr>, <acronym>, <address>, <applet> and <area>.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.