1

I have this string:

String a = "$$bar$55^$$";

I want remove all symbols. I make regex:

String b = a.replaceAll("(?<=[^[\\p{Alpha}][\\p{Digit}]])", "");

But, I get:

$$bar$55^$$

But I want to get this string:

bar55

What am I doing wrong? How can I filter out all characters except letters and numbers?

In Oracle it work for me:

select regexp_replace('$$bar$55^$$','[^[:alpha:][:digit:]]*') from dual;
1
  • Just a note: [[:alpha:]] can be translated as \p{Alpha}, not [\\p{Alpha}]. POSIX character classes can only be used inside bracket expressions (Oracle uses a POSIX regex engine), and shorthand character classes in Java regex do not have to be wrapped with [...] individually. Also, [[:alpha:][:digit:]] = [[:alnum:]]. Hence, I suggest \P{Alnum} to match any chars other than alphanumeric. Although you also may use "[^\\p{Alpha}\\p{Digit}]+". No need for the nested character classes and the resulting union. Commented Feb 8, 2019 at 13:13

1 Answer 1

3

You are using a lookaround that is a non-consuming pattern, i.e. the match value will always be empty since only a location inside a string will be matched. Use

String b = a.replaceAll("\\P{Alnum}+", "");

The \\P{Alnum}+ pattern matches one or more chars other than ASCII alphanumeric chars. Also, see Predefined Character classes.

Alternatively, you may use

String b = a.replaceAll("[^\\p{L}\\p{P}\\p{S}]+", "");

This will remove chunks of 1 or more chars other than Unicode letters, punctuation and symbols.

Sign up to request clarification or add additional context in comments.

1 Comment

FYI, see the Java demo online. "$$bar$55^$$".replaceAll("\\P{Alnum}+", "") yields bar55.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.