Regular expression for remove html links [duplicate]

Question

Possible Duplicate:
Regular expression for parsing links from a webpage?
RegEx match open tags except XHTML self-contained tags

i need a regular expression to strip html <a> tags , here is sample:

<a href="xxxx" class="yyy" title="zzz" ...> link </a>

should be converted to

 link

Do you 'need' a regular expression?

Matt Fenwick
– Matt Fenwick

2011-09-23 16:46:19 +00:00
Commented Sep 23, 2011 at 16:46 — Matt Fenwick
– Matt Fenwick, Commented Sep 23, 2011 at 16:46
@josh3736 I will feast on your Unicorn's blood.

Mateen Ulhaq
– Mateen Ulhaq

2011-09-26 23:18:24 +00:00
Commented Sep 26, 2011 at 23:18 — Mateen Ulhaq
– Mateen Ulhaq, Commented Sep 26, 2011 at 23:18
In what language? HTML doesn't have regular expressions.

Bill the Lizard
– Bill the Lizard

2011-09-29 01:53:47 +00:00
Commented Sep 29, 2011 at 1:53 — Bill the Lizard
– Bill the Lizard, Commented Sep 29, 2011 at 1:53

Bill Criswell · Accepted Answer · 2011-09-26 15:07:59Z

13

I think you're looking for: </?a(|\s+[^>]+)>

edited Sep 26, 2011 at 15:07

answered Sep 23, 2011 at 16:40

Bill Criswell

33k8 gold badges80 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bill Criswell Over a year ago

When have you ever seen just an <a> tag?

Bill Criswell Over a year ago

I edited it to account for strange cases like that anyway.

Mateen Ulhaq Over a year ago

Doesn't match < a> or < /a>.

rbrignoni · Accepted Answer · 2011-09-24 21:23:59Z

3

Answers given above would match valid html tags such as <abbr> or <address> or <applet> and strip them out erroneously. A better regex to match only anchor tags would be

</?a(?:(?= )[^>]*)?>

answered Sep 24, 2011 at 21:23

rbrignoni

461 bronze badge

1 Comment

GaryP Over a year ago

I've used this one with the free edition of sublime text 3. Worked best in my case.

ridgerunner · Accepted Answer · 2011-09-26 15:36:36Z

2

Here's what I would use:

</?a\b[^>]*>

answered Sep 26, 2011 at 15:36

ridgerunner

34.6k6 gold badges60 silver badges70 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 11:53:14Z

2

You're going to have to use this hackish solution iteratively, and it won't probably even work perfectly for complicated HTML:

<a(\s[^>]*)?>.*?(</a>)?

Alternatively, you can try one of the existing HTML sanitizers/parsers out there.

HTML is not a regular language; any regex we give you will not be 'correct'. It's impossible. Even Jon Skeet and Chuck Norris can't do it. Before I lapse into a fit of rage, like @bobince [in]famously once did, I'll just say this:

Use a HTML Parser.

(Whatever they're called.)

EDIT:

If you want to 'incorrectly' strip out </a>s that don't have any <a>s as well, do this:

</?[a\s]*[^>]*>

edited May 23, 2017 at 11:53

CommunityBot

11 silver badge

answered Sep 25, 2011 at 3:00

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

4 Comments

ridgerunner Over a year ago

Your regex: <a(\s[^>]*)?>(</a>)? does not match </a> closing tags (except for the case where the A element is empty).

Mateen Ulhaq Over a year ago

@ridgerunner Since regexes don't have memory, putting a .*? in between the two is the best I can do. It'll break down for more complicated HTML.

Bill Criswell Over a year ago

Just curious: Why are you worried about the tag's text at all?

Mateen Ulhaq Over a year ago

@BillCriswell Oh, damn, I just realized the OP probably doesn't need a 'regex' which will not strip out unmatched </a>s. (That would be incorrect, but I don't think the OP would care. :))

arviman · Accepted Answer · 2011-09-23 16:44:13Z

1

</?a.*?> would work. Replace it with ''

answered Sep 23, 2011 at 16:44

arviman

5,27546 silver badges48 bronze badges

9 Comments

ShirazITCo Over a year ago

i just make a little change that works for me. thanks for help. /<a.*?>/ , edit your answer.

arviman Over a year ago

Yes of course, I merely gave the RE. You would have to append the / prefix/suffix if you were using javascript for instance. You would not have to add anything if you were using the C# regex library.

ShirazITCo Over a year ago

but there is a little problem. the </a> not striped .

arviman Over a year ago

are you using POSIX or PCRE? i.e ereg_replace or preg_replace

Bill Criswell Over a year ago

FYI: This also matches elements like <abbr>, <acronym>, <address>, <applet> and <area>.

|

Collectives™ on Stack Overflow

Regular expression for remove html links [duplicate]

5 Answers 5

3 Comments

1 Comment

Comments

Use a HTML Parser.

EDIT:

4 Comments

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

Comments

Use a HTML Parser.

EDIT:

4 Comments

9 Comments

Linked

Related