0

I get a simple string from which I want to extract some values. The values are separated by whitespace characters as follows:

abc               0.00    11.00    0.00    4.50     0.00   124.00    27.56     0.01    1.44   0.89   0.40

I want to get those values: abc, 0.00, 11.00,...

I tried this:

    String line = "abc               0.00    11.00    0.00    4.50     0.00   124.00    27.56     0.01    1.44   0.89   0.40";
    String regex = "^([\\w\\.]*)\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\$";
    Pattern ptrn = Pattern.compile(regex);
    Matcher matcher = ptrn.matcher(line); 
    if(matcher.find())
    {
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        System.out.println(matcher.group(3));
        System.out.println(matcher.group(4));
        System.out.println(matcher.group(5));
        System.out.println(matcher.group(6));
        System.out.println(matcher.group(7));           
        System.out.println(matcher.group(8));
        System.out.println(matcher.group(9));
        System.out.println(matcher.group(10));
        System.out.println(matcher.group(11));
        System.out.println(matcher.group(12));          
    }

I am getting following output:

abc
0
0
0
0
0
0
6
1
4
9
0

What I am doing wrong?

7
  • 3
    You really feel that you need ^([\\w\\.]*)\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\\s+([\\w\\.])*\$ just to split this string into it's parts? Commented Sep 9, 2015 at 10:42
  • 1
    You can use String[] arr = line.split("\\s+"); Commented Sep 9, 2015 at 10:45
  • @Takendarkk yess cause I dont know what i will be getting in this regex variable. I will be getting the regex from user input file. This is somewhat related to generalizing the data extraction process. Commented Sep 9, 2015 at 10:45
  • @anubhava I know I can use string methods, but both string to be parsed and how to parse it changes time to time. So generalizing it with regex. Commented Sep 9, 2015 at 10:46
  • 1
    Lol you call that "generalizing"? Poor approach. Commented Sep 9, 2015 at 10:49

5 Answers 5

4
  • Firstly, your example won't compile as you have a hanging \ at the end of the pattern String.
  • Secondly, you misplaced the greedy 0+ quantifier in all groups following the first one - you can easily work around it by using: ([\\w\\.]*) instead of ([\\w\\.])*
  • The subtle difference between the above patterns lies essentially in what you are grouping
  • Thirdly, you're probably better of splitting your input by whitespace and iterating over the array elements

Example

String line = "abc               0.00    11.00    0.00    4.50     0.00   124.00    27.56     0.01    1.44   0.89   0.40";
String[] items = line.split("\\s+");
System.out.println(Arrays.toString(items));

Output

[abc, 0.00, 11.00, 0.00, 4.50, 0.00, 124.00, 27.56, 0.01, 1.44, 0.89, 0.40]

Note

As your array is (0-)indexed, you can retrieve each item by its index, e.g. items[0], items[1], ... items[items.length - 1].

Sign up to request clarification or add additional context in comments.

10 Comments

unfortunately I cant use string methods. This is more related to auto data extraction from different types of files. I dont want to write custom code for each different type of files. Thus I wrote code which takes each line of a file, matches given regex against it and captures its groups as values to be extracted. This allows more flexibility as user can specify the data file path and regex to parse that file in another config file. Now I dont have to write custom code for parsing each file but just a config file specifying the regex to be used.
@Mahesha999 but why can't you use String methods? Is this homework?
@Mahesha999 Your claim that This allows more flexibility is just flat out wrong. You are adding so much complexity for no reason.
@Takendarkk I agree with you within the context so far. By the way, the split token could always be made variable, if you have differently separated inputs (for simple cases at least).
@Mena lol no this is not homework. If I use String.split() then user will be forced to provide the regex for splitting each line of the file. And consider if the line splitter is not same, then he will be forced to provide mutliple splitters. Though this is possible, I am following group capturing as user will at first anyway specify the regex for whole line, to indicate which all lines in a file need to be parsed. Then he only needs to specify the groups to be captured on that line, so regex will mostly be the same.
|
0

Your regexp is wrong. The quantity qualifier must be put inside the group delimiter, not outside, indeed the first group is ok. This is the correct RegExp:

([\w\.]*)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)\s+([\w\.]+)

Tested on: https://regex101.com/

2 Comments

Now that's precisely pointed out stupid mistake in my regex: The quantity qualifier must be put inside the group delimiter, not outside
It happens: that regexp is quite complex. I suggest you to use the cited regexp tester, I find it very useful.
0

You can use a different regex only to match the digits

\\d+\\.\\d+
  • \\d+ Matches one or more digits
  • \\. Matches a .

Example

String line = "abc               0.00    11.00    0.00    4.50     0.00   124.00    27.56     0.01    1.44   0.89   0.40";
String regex = "\\d+\\.\\d+";
Pattern ptrn = Pattern.compile(regex);
Matcher matcher = ptrn.matcher(line); 
while (matcher.find())
{
    System.out.println(matcher.group());
}

/* Output

0.00
11.00
0.00
4.50
0.00
124.00
27.56

*/

Comments

0

Why do you want to make it so complex?

Use split and then spilt it with //s+ one or more than one white space.

Then take it from the array.

String string = "sdn agsd ds     adg sd g dsg sdg  dsg dsg";
String []tokens = string.split("\\s+");
for(int i = 0; i < tokens.length; ++i)
  System.out.println(tokens[i]);

Comments

0

try:

    String line = "abc               0.00    11.00    0.00    4.50     0.00   124.00    27.56     0.01    1.44   0.89   0.40";
    Pattern p = Pattern.compile("[^\\s]+");
    Matcher m = p.matcher(line);
    while(m.find()){
        System.out.println(m.group());
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.