0

I have a string containing words separated by one or more blank characters (space, tab, etc.). I'm trying to write the most optimized procedure possible that outputs the string with the same words in the same order, but separated by only one space.

I'm trying this but I still have a problem:

public class Test {  
    public static void main(String args[]) {  
        String str = "word1, word2 word3@+word4?.word5.word6";  
        Stream<String> stream = Arrays.stream(input.split( "[, ?.@]+_" ));  
            .stream().collect(Collectors.joining(" "));
    }  
}
5
  • "but I still have a problem". So which one is it? Commented Feb 2, 2021 at 23:08
  • If you want computationally fast, and you know the separators are always just a single character as in your example, use a char array and manually replace Commented Feb 2, 2021 at 23:11
  • We can have one or more blank characters. Thanks for your comment Jems , I've edited my example. Commented Feb 2, 2021 at 23:15
  • 1
    What's the problem? Is it the stray _? I'm guessing [, ?.@]+_ should be [, ?.@_]+. Commented Feb 2, 2021 at 23:20
  • 2
    Why use Stream for this? A simple s = s.replaceAll("\\P{Alnum}+", " ").trim() can do it. Commented Feb 3, 2021 at 0:04

3 Answers 3

1

Split by one or more non-word characters \W+ and collect using Collectors.joining

String input = "word1, word2 word3@word4?word5.word6";  

String str = Arrays.stream(input.split( "\\W+" ))
                    .collect(Collectors.joining(" "));
System.out.println(str);
Sign up to request clarification or add additional context in comments.

2 Comments

Why not just str = input.replaceAll("\\W+", " ") and lose that Stream overhead?
This is possible too. As for "why Stream?" - should have asked myself this question before answering
1

Is this your requirement?

String s = "abc  a    b";
System.out.println(s.replaceAll("\\s+", " ")); // abc a b

Or if you need to remove any non-alpha numeric character as word boundary too and remove them, then please use this

  String s = "abc  a+.?b    c";
  System.out.println(s.replaceAll("\\W+", " ").replaceAll("\\s+", " ")); // abc a b c

Thank you @ohn Kugelman for fixing

Comments

0

In this case to get fast computation I think we can use String.charAt(i) method. We have to use the ASCII values of each characters,

  • Digits 0 to 9 has ASCII values of 48 to 57
  • Capital letters A to Z has ASCII values of 65 to 90
  • Small letters a to z has ASCII values of 97 to 122
  • Space or " " has ASCII value of 32

Except this ASCII values we will eliminate all other ASCII values.

String str = "word1,     word2 word3@+word4?.word5.word6";  
String newStr = "";
boolean flag = false;
for(int i = 0 ; i < str.length(); i++){
    int charAsciiValue = (int)str.charAt(i);
    if((charAsciiValue >= 48 && charAsciiValue <= 57) || (charAsciiValue >= 65 && charAsciiValue <= 90) || charAsciiValue >= 97 && charAsciiValue <= 122){
        newStr += str.charAt(i);
        flag = true;
    }
    else if(charAsciiValue == 32 && flag){
        newStr += " ";
        flag = false;
    }else if(charAsciiValue != 32 && flag){
        newStr += " ";
        flag = false;
    }
}
System.out.println(newStr);

Output:

word1 word2 word3 word4 word5 word6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.