0

How do I split this comma+quote delimited String into a set of strings:

String test = "[\"String 1\",\"String, two\"]"; 
String[] embeddedStrings = test.split("<insert magic regex here>");
//note: It should also work for this string, with a space after the separating comma: "[\"String 1\", \"String, two\"]";    

assertEquals("String 1", embeddedStrings[0]);
assertEquals("String, two", embeddedStrings[1]);

I'm fine with trimming the square brackets as a first step. But the catch is, even if I do that, I can't just split on a comma because embedded strings can have commas in them. Using Apache StringUtils is also acceptable.

1
  • So you're output will always be 'String 1' and 'String, two'? I guess you have comma delimited, quote enclosed fields. Are the quotes optional or required? Commented Dec 22, 2009 at 21:31

4 Answers 4

3

You could also use one of the many open source small libraries for parsing CSVs, e.g. opencsv or Commons CSV.

Sign up to request clarification or add additional context in comments.

Comments

1

If you can remove [\" from the start of the outer string and \"] from the end of it to become:

      String test = "String 1\",\"String, two"; 

You can use:

     test.split("\",\"");

1 Comment

I ended up going with this. It's ugly, as most regex is, but it's effective and my options are limited: String noBrackets = StringUtils.substringBetween(test, "[\"", "\"]"); String[] results = noBrackets.split("\",[ ]*\"");
0

This is extremely fragile and should be avoided, but you could match the string literals.

Pattern p = Pattern.compile("\"((?:[^\"]+|\\\\\")*)\"");

String test = "[\"String 1\",\"String, two\"]";
Matcher m = p.matcher(test);
ArrayList<String> embeddedStrings = new ArrayList<String>();
while (m.find()) {
    embeddedStrings.add(m.group(1));
}

The regular expression assumes that double quotes in the input are escaped using \" and not "". The pattern would break if the input had an odd number of (unescaped) double quotes.

Comments

0

Brute-force method, some of this may be pseudocode and I think there's a fencepost problem when setting currStart and/or String.substring(). This assumes that brackets are already removed.

boolean inquote = false;
List strings = new ArrayList();
int currStart=0;
for (int i=0; i<test.length(); i++) {
  char c = test.charAt(i);
  if (c == ',' && ! inquote) {
    strings.add(test.substring(currStart, i);
    currStart = i;
  }
  else if (c == ' ' && currStart + == i)
    currStart = i; // strip off spaces after a comma
  else if (c == '"')
    inquote != inquote;
}
strings.add(test.substring(currStart,i));
String embeddedStrings = strings.toArray();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.