4

Thanks for everyone ^_^,the problem is solved:there is a single line is too big(over 400M...I download a damaged file while I didn't realize), so throw a OutOfMemoryError

I want to split a file by using java,but it always throw OutOfMemoryError: Java heap space,I searched on the whole Internet,but it looks like no help :(

ps. the file's size is 600M,and it have over 30,000,000 lines,every line is no longer than 100 chars. (maybe you can generate a "level file" like this:{ id:0000000001,level:1 id:0000000002,level:2 ....(over 30 millions) })

pss. set the Jvm memory size larger is not work,:(

psss. I changed to another PC, problem remains/(ㄒoㄒ)/~~

no matter how large the -Xms or -Xmx I set,the outputFile's size is always same,(and the Runtime.getRuntime().totalMemory() is truely changed)

here's the stack trace:

 Heap Size = 2058027008
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
        at java.lang.StringBuffer.append(StringBuffer.java:306)
        at java.io.BufferedReader.readLine(BufferedReader.java:345)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at com.xiaomi.vip.tools.ptupdate.updator.Spilt.main(Spilt.java:39)
    ...

here's my code:

package com.updator;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;

public class Spilt {
    public static void main(String[] args) throws Exception {
        long heapSize = Runtime.getRuntime().totalMemory();

        // Print the jvm heap size.
        System.out.println("Heap Size = " + heapSize);

        String mainPath = "/home/work/bingo/";
        File mainFilePath = new File(mainPath);
        FileInputStream inputStream = null;
        FileOutputStream outputStream = null;
        try {
            if (!mainFilePath.exists())
                mainFilePath.mkdir();

            String sourcePath = "/home/work/bingo/level.txt";
            inputStream = new FileInputStream(sourcePath);
            BufferedReader bufferedReader = new BufferedReader(new FileReader(
                    new File(sourcePath)));

            String savePath = mainPath + "tmp/";
            Integer i = 0;
            File file = new File(savePath + "part"
                    + String.format("%0" + 5 + "d", i) + ".txt");
            if (!file.getParentFile().exists())
                file.getParentFile().mkdir();
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            int count = 0, total = 0;
            String line = null;
            while ((line = bufferedReader.readLine()) != null) {
                line += '\n';
                outputStream.write(line.getBytes("UTF-8"));
                count++;
                total++;
                if (count > 4000000) {
                    outputStream.flush();
                    outputStream.close();
                    System.gc();
                    count = 0;
                    i++;
                    file = new File(savePath + "part"
                            + String.format("%0" + 5 + "d", i) + ".txt");
                    file.createNewFile();
                    outputStream = new FileOutputStream(file);
                }
            }

            outputStream.close();
            file = new File(mainFilePath + "_SUCCESS");
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            outputStream.write(i.toString().getBytes("UTF-8"));
        } finally {
            if (inputStream != null)
                inputStream.close();
            if (outputStream != null)
                outputStream.close();
        }
    }
}

I think maybe: when outputStream.close(),the memory did not release?

15
  • Show the exception and stack trace Commented Jan 4, 2017 at 8:04
  • Check out the following link which has been answered by others - stackoverflow.com/questions/11578123/… Commented Jan 4, 2017 at 8:15
  • 1
    Why do you use a Scanner? You don't need the functionality, a BufferedReader would be enough and much less resource-hungry. Commented Jan 4, 2017 at 8:25
  • @FlameHaze, I tried, but no matter how large the -Xms or -Xmx I set,the outputFile's size is always same,(the Runtime.getRuntime().totalMemory() is truely changed) Commented Jan 4, 2017 at 8:28
  • 2
    Well the stack is pretty clear : bufferedReader.readLine throws outOfMemory. The most straightforward cause to look for is : there is a single line does not fit into memory. (And you could System.out.println a line count to see which one). Commented Jan 4, 2017 at 10:20

2 Answers 2

3

So you open the original file and create a BufferedReaderand a counter for the lines.

char[] buffer = new char[5120];
BufferedReader reader = Files.newBufferedReader(Paths.get(sourcePath), StandardCharsets.UTF_8);
int lineCount = 0;

Now you read into your buffer, and write the characters as they come in.

int read;

BufferedWriter writer = Files.newBufferedWriter(Paths.get(fileName), StandardCharsets.UTF_8);
while((read = reader.read(buffer, 0, 5120))>0){
    int offset = 0;
    for(int i = 0; i<read; i++){
        char c = buffer[i];
        if(c=='\n'){
           lineCount++;
           if(lineCount==maxLineCount){
              //write the range from 0 to i to your old writer.
              writer.write(buffer, offset, i-offset);
              writer.close();
              offset=i;
              lineCount=0;
              writer = Files.newBufferedWriter(Paths.get(newName), StandarCharset.UTF_8);
           }
        }
        writer.write(buffer, offset, read-offset);
    }
    writer.close();
}

That should keep the memory usage lower and prevent you from reading too large of a line at once. You could go without BufferedWriters and control the memory even more, but I don't think that is necessary.

Sign up to request clarification or add additional context in comments.

2 Comments

why 5120?and why read 5120 at once...? I mean if one line just only 100 long,should not it be worse?
5120 is a buffer size, and I just picked it arbitrarily. Since a buffered reader is being used, it doesn't matter, it would even work fine to just read one character at a time. Why do you think it will perform worse for a line that is 100 long?
1

I've tested with large text file.(250Mb)

it works well.

You need to add try catch exception codes for file stream.

public class MyTest {
    public static void main(String[] args) {
        String mainPath = "/home/work/bingo/";
        File mainFilePath = new File(mainPath);
        FileInputStream inputStream = null;
        FileOutputStream outputStream = null;
        try {
            if (!mainFilePath.exists())
                mainFilePath.mkdir();

            String sourcePath = "/home/work/bingo/level.txt";
            inputStream = new FileInputStream(sourcePath);
            Scanner scanner = new Scanner(inputStream, "UTF-8");

            String savePath = mainPath + "tmp/";
            Integer i = 0;
            File file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
            if (!file.getParentFile().exists())
                file.getParentFile().mkdir();
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            int count = 0, total = 0;

            while (scanner.hasNextLine()) {
                String line = scanner.nextLine() + "\n";
                outputStream.write(line.getBytes("UTF-8"));
                count++;
                total++;
                if (count > 4000000) {
                    outputStream.flush();
                    outputStream.close();
                    count = 0;
                    i++;
                    file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
                    file.createNewFile();
                    outputStream = new FileOutputStream(file);
                }
            }

            outputStream.close();
            file = new File(mainFilePath + "_SUCCESS");
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            outputStream.write(i.toString().getBytes("UTF-8"));
        } catch (FileNotFoundException e) {
            System.out.println("ERROR: FileNotFoundException :: " + e.getStackTrace());
        } catch (IOException e) {
            System.out.println("ERROR: IOException :: " + e.getStackTrace());
        } finally {
            if (inputStream != null)
                try {
                    inputStream.close();
                    if (outputStream != null)
                        outputStream.close();

                } catch (IOException e) {
                    e.printStackTrace();
                }
        }
    }
}

if the problem still occurs, change java heap memory size with following command on the shell prompt.

ex) Xmx1g : 1Gb heap memory size, MyTest : class name

java -Xmx1g MyTest

4 Comments

I've tried again with 2Gb text file. but, there are no issues. My system environment : Intel i5 / Java 1.7 / 6Gb memory.
if your system has very small size memory, decrease line count number. for example. 4000000 to 400
I tried, problem remains :( , ps. My environment:i7/Java1.6/16GB memory,and the outputFiles total size is same,too
,Thank you very much...but change the heap memory is not work.... /(ㄒoㄒ)/~~

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.