1

Sorry, if this is a lame question, I am quite new to Java development and regex patterns.

Basically I have a long string which has multiple occurrences of substrings like InstanceId: i-1234XYAadsadd, and I want to extract out the i-1234XYAadsadd part in an ArrayList using regex. Please help with the correct regular expression here.

//instanceResultString is the actual string containing occurences of pattern
List<String> instanceIdList = new ArrayList<String>(); 
    Matcher matcher = Pattern.compile("InstanceId:[.]*,").matcher(instanceResultString);
    while(matcher.find())
        instanceIdList.add(matcher.group());
6
  • 1
    Maybe "InstanceId:\\s*(\\S+)," and access .group(1) Commented Sep 26, 2016 at 20:24
  • I guess this will include the whitespace(s) after InstanceId:, I want to exclude those Commented Sep 26, 2016 at 20:25
  • 1
    Ideone is too slow now, can't show the demo. Try Matcher matcher = Pattern.compile("InstanceId:\\s*(\\S+),").matcher(instanceResultString); while(matcher.find()) instanceIdList.add(matcher.group(1)); Commented Sep 26, 2016 at 20:27
  • See ideone.com/LaFmXw Commented Sep 26, 2016 at 20:30
  • Great, thanks a lot :) Commented Sep 26, 2016 at 20:33

1 Answer 1

1

The only point here is that the strings you want to match are made of non-whitespace characters. The \S pattern matches a non-whitespace char.

See this demo:

String instanceResultString = "InstanceId: i-1234XYAadsadd, More text: InstanceId: u-222tttt, dde InstanceId: i-8999UIIIgjkkd,";
List<String> instanceIdList = new ArrayList<String>();
Matcher matcher = Pattern.compile("InstanceId:\\s*(\\S+),").matcher(instanceResultString);
while(matcher.find())
    instanceIdList.add(matcher.group(1));
System.out.println(instanceIdList); // Demo line
// => [i-1234XYAadsadd, u-222tttt, i-8999UIIIgjkkd]

Where

  • InstanceId: - a literal InstanceId: text
  • \\s* - zero or more whitespaces
  • (\\S+) - Group 1 (we grab these contents with .group(1)) capturing 1 or more (but as many as possible) non-whitespace symbols
  • , - a comma.
Sign up to request clarification or add additional context in comments.

1 Comment

Note that you might also consider InstanceId:\\s*(\\S+)\\b pattern to match on the last word boundary after as many non-whitespace symbols as there are after InstanceId: + zero or more whitespaces.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.