Skip to main content
Added reference.
Source Link
Autumn Skye
  • 121
  • 1
  • 1
  • 6

This is the method I use for parsing large (1GB+) tab-delimited files. It has far less overhead than String.split(), but is limited to char as a delimiter. If anyone has a faster method, I'd like to see it. This can also be done over CharSequence and CharSequence.subSequence, but that requires implementing CharSequence.indexOf(char) (refer to the package method String.indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) if interested).

public static String[] split(final String line, final char delimiter)
{
    CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
    int wordCount = 0;
    int i = 0;
    int j = line.indexOf(delimiter, 0); // first substring

    while (j >= 0)
    {
        temp[wordCount++] = line.substring(i, j);
        i = j + 1;
        j = line.indexOf(delimiter, i); // rest of substrings
    }

    temp[wordCount++] = line.substring(i); // last substring

    String[] result = new String[wordCount];
    System.arraycopy(temp, 0, result, 0, wordCount);

    return result;
}

This is the method I use for parsing large (1GB+) tab-delimited files. It has far less overhead than String.split(), but is limited to char as a delimiter. If anyone has a faster method, I'd like to see it. This can also be done over CharSequence and CharSequence.subSequence, but that requires implementing CharSequence.indexOf(char).

public static String[] split(final String line, final char delimiter)
{
    CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
    int wordCount = 0;
    int i = 0;
    int j = line.indexOf(delimiter, 0); // first substring

    while (j >= 0)
    {
        temp[wordCount++] = line.substring(i, j);
        i = j + 1;
        j = line.indexOf(delimiter, i); // rest of substrings
    }

    temp[wordCount++] = line.substring(i); // last substring

    String[] result = new String[wordCount];
    System.arraycopy(temp, 0, result, 0, wordCount);

    return result;
}

This is the method I use for parsing large (1GB+) tab-delimited files. It has far less overhead than String.split(), but is limited to char as a delimiter. If anyone has a faster method, I'd like to see it. This can also be done over CharSequence and CharSequence.subSequence, but that requires implementing CharSequence.indexOf(char) (refer to the package method String.indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) if interested).

public static String[] split(final String line, final char delimiter)
{
    CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
    int wordCount = 0;
    int i = 0;
    int j = line.indexOf(delimiter, 0); // first substring

    while (j >= 0)
    {
        temp[wordCount++] = line.substring(i, j);
        i = j + 1;
        j = line.indexOf(delimiter, i); // rest of substrings
    }

    temp[wordCount++] = line.substring(i); // last substring

    String[] result = new String[wordCount];
    System.arraycopy(temp, 0, result, 0, wordCount);

    return result;
}
Source Link
Autumn Skye
  • 121
  • 1
  • 1
  • 6

This is the method I use for parsing large (1GB+) tab-delimited files. It has far less overhead than String.split(), but is limited to char as a delimiter. If anyone has a faster method, I'd like to see it. This can also be done over CharSequence and CharSequence.subSequence, but that requires implementing CharSequence.indexOf(char).

public static String[] split(final String line, final char delimiter)
{
    CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
    int wordCount = 0;
    int i = 0;
    int j = line.indexOf(delimiter, 0); // first substring

    while (j >= 0)
    {
        temp[wordCount++] = line.substring(i, j);
        i = j + 1;
        j = line.indexOf(delimiter, i); // rest of substrings
    }

    temp[wordCount++] = line.substring(i); // last substring

    String[] result = new String[wordCount];
    System.arraycopy(temp, 0, result, 0, wordCount);

    return result;
}