1

I'm new to regex, Sorry for my noobish question
My problem is i want to group the data on the String

What i want to get is the ff:

  1. a-z A-Z or a-z A-Z 0-9 (ex: abc, bzc15 but not 1abc or 14bc)
  2. 0-9 (ex: 1,23,56 and etc)
  3. these operators + * - /
  4. the white space
  5. ( and )

I want to group them in an array and preserve their position if possible.

Ex:

String test = "a + b + 6";

The result should be something like this

Array[0] = a
Array[1] = White Space
Array[2] = +
Array[3] = White Space
Array[4] = b
Array[5] = White Space
Array[6] = +
Array[7] = White Space
Array[8] = 6

Is this possible? If yes, what pattern should i use?
Any help will be appriciated

4
  • you can use test.toCharArray() Commented May 1, 2013 at 12:19
  • Maybe get each element of the character array and convert it to int, then get its representation fron a ANSI chart and output that? Might be easier than regex Commented May 1, 2013 at 12:19
  • If i use test.toCharArrary() i can't group data with something like this ab + b Commented May 1, 2013 at 12:29
  • please see my answer, You will need a parser to accomplish the task you are suggesting, unless you want to force your user to use a whole bunch of extra whitespace. Commented May 1, 2013 at 12:52

4 Answers 4

1

Try this:

String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))");

I deduced that you want to split at the start, or the end, of whitespace. But the regex has to be zero-width, otherwise the whitespace would be consumed. This is achieved by using look behinds and look aheads, which are zero-width. The reflexes in the look arounds are:

  • \s means "a whitespace character"
  • \S means "a non-whitespace character"

Then there's the look arounds:

  • (?<=regex) asserts that the preceding input matches regex
  • (?=regex) asserts that the following input matches regex

Then there's the OR:

  • (regex1)|(regex2) means "matches either regex1 or regex2"
Sign up to request clarification or add additional context in comments.

2 Comments

Wow you got it bro. But can you explain the pattern to me?
I didn't expect that the pattern will be this short. Many thanks
0

Try this:

char[] charArr = test.toCharArray();

Example:

public static void main(String[] args) {
    String test = "a + b + 6";
    char[] charArr = test.toCharArray();
    System.out.println(Arrays.toString(charArr));
}

Output:

[a,  , +,  , b,  , +,  , 6]

1 Comment

If i use test.toCharArrary() i can't group data with something like this ab + b. Thanks anyways
0

I am guessing here but I think you want to parse mathematical statements, or in other words you are trying to perform Lexical Analysis - (http://en.wikipedia.org/wiki/Lexical_analysis)

You might want to consider one of java fully developed lexical analysis / parsere generators for an easy solution, The only one that I have worked with is CUP http://www.cs.princeton.edu/~appel/modern/java/CUP/ and it is quite easy to use.

Other wise you will need to write some custom parser code.

String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))"); or char[] charArr = test.toCharArray(); are inapproriate here since the following are cases where you will have inproperly tokenized results

input       Expected Result     Result of bad solution
(2 + 4)     [(,2,+,4,)]         [(2,+,4)]
1+2         [1,+,2]             [1+2]
2 + 14(5)   [2,+,14,(,5,)]      [2,+3,14(5)]
3a          [3,a]               [3a]
abs(5 + 6)  [abs,(,5,+,6,)]     [abs(5,+,6)]

*basicaly anywhere the input does not have an explicit space between token, which    
should be allowed but the other suggested solutions do not support. 

1 Comment

I see you have a good point there. But my task is to develop my own Lexical Analyzer so i really can't use a made project. Thanks anyways, i want to vote up your post but why i can't?
0

I think this regex will do what you want:

"((?<=\\d)(?=\\p{Alpha}))|((?<=\\w)(?=\\W))|((?<=\\W)(?=\\w))|((?<=\\W)(?=\\W))"

It splits the String at the following locations:

  • After a digit [0-9] and before a letter [a-zA-Z].
  • Between a word-character [a-zA-Z_0-9] and a non-word character.
  • Between two non-word characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.