2

Given a string S, find the number of words in that string. For this problem a word is defined by a string of one or more English letters.

Note: Space or any of the special characters like ![,?.\_'@+] will act as a delimiter.

Input Format: The string will only contain lower case English letters, upper case English letters, spaces, and these special characters: ![,?._'@+].

Output Format: On the first line, print the number of words in the string. The words don't need to be unique. Then, print each word in a separate line.

My code:

    Scanner sc = new Scanner(System.in);
    String str = sc.nextLine();
    String regex = "( |!|[|,|?|.|_|'|@|+|]|\\\\)+";
    String[] arr = str.split(regex);
    
    System.out.println(arr.length);
    
    for(int i = 0; i < arr.length; i++)
        System.out.println(arr[i]);

When I submit the code, it works for just over half of the test cases. I do not know what the test cases are. I'm asking for help with the Murphy's law. What are the situations where the regex I implemented won't work?

3
  • 4
    Why are you including a backslash in your regex? It's not in the requirement. Also, you're using [ and ] without escaping them. Commented Jan 21, 2016 at 1:10
  • 3
    For characters with special meaning in regex, you need to escape them. As a side note, you might find it easier and clearer to split based off of a character set (for example, [a-z] is the set of all lowercase letters) rather than a series of X or Y or Z cases. Commented Jan 21, 2016 at 1:12
  • I apologize. Didn't know I had to escape the backslash in order to post it here. Commented Jan 21, 2016 at 4:03

1 Answer 1

1

You don't escape some special characters in your regex. Let's start with []. Since you don't escape them, the part [|,|?|.|_|'|@|+|] is treated like a set of characters |,?._'@+. This means that your regex doesn't split on [ and ].

For example x..]y+[z is split to x, ]y and [z.

You can fix that by escaping those characters. That will force you to escape more of them and you end up with a proper definition:

String regex = "( |!|\\[|,|\\?|\\.|_|'|@|\\+|\\])+";

Note that instead of defining alternatives, you could use a set which will make your regex easier to read:

String regex = "[!\\[,?._'@+\\].]+";

In this case you only need to escape [ and ].

UPDATE:

There's also a problem with leading special character (like in your example ".Hi?there[broski.]@@@@@"). You need to split on it but it produces an empty string in the results. I don't think there's a way to use split function without producing it but you can mitigate it by removing the first group before splitting using the same regex:

String[] arr = str.replaceFirst(regex, "").split(regex);
Sign up to request clarification or add additional context in comments.

1 Comment

I appreciate your response. I did a test input of ".Hi?there[broski.]@@@@@" without the quotes. The output printed out 4, a blank line, hi, there, broski (each on there own line). I assume it printed out the blank line because of the '.' in front of "Hi". How would I fix that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.