1

I've tried this for a couple of hours and wasn't able to do this correctly; so I figured I'd post it here. Here's my problem.

Given a string in java :

"this is <a href='something'>one \nlink</a> some text <a href='fubar'>two \nlink</a> extra text"

Now i want to strip out the link tag from this string using regular expressions - so the resulting string should look like :

"this is one \nlink some text two \nlink extra text"

I've tried all kind of things in java regular expressions; capturing groups, greedy qualifiers - you name it, and still can't get it to work quite right. If there's only one link tag in the string, I can get it work easily. However my string can have multiple url's embedded in it which is what's preventing my expression to work. Here's what i have so far - (?s).*(<a.*>(.*)</a>).*

Note that the string inside the link can be of variable length, which is why i have the .* in the expression.

If somebody can give me a regular expression that'll work, I'll be extremely grateful. Short of looping through each character and removing the links i can't find a solution.

1
  • If you want to follow standards, (X)HTML attributes are surrounded by double quotes ("), not single quotes ('). Commented Dec 29, 2009 at 20:24

3 Answers 3

3

Sometimes it's easier to do it in 2 steps:

s = "this is <a href='something'>one \nlink</a> some text <a href='fubar'>two \nlink</a> extra text"
s.replaceAll("<a[^>]*>", "").replaceAll("</a>", "")
Result: "this is one \nlink some text two \nlink extra text"
Sign up to request clarification or add additional context in comments.

Comments

2

Here's the way I usually match tags:

<a .*?>|</a>

and replace with an empty string.

Alternatively, instead of removing the tag, you might comment it out. The match pattern would be the same, but the replacement would be:

<!--\0-->

or

<!--$0-->

If you want to have a reference to the anchor text, use this match pattern:

<a .*?>(.*?)</a>

and the replacement would be an index of 1 instead of 0.

Note: Sometimes you have to use programming-language specific flags to allow regex to match across lines (multi-line pattern match). Here's a Java Example

Pattern aPattern = Pattern.compile(regexString,Pattern.MULTILINE);

Comments

1

Off the top of my head

"<a [^>]*>|</a>"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.