1

I have tried various kinds of codes to convert a large CSV file (~300 MB) to byte[] but each time it fails giving Java Heap Space error as shown below:

184898 [jobLauncherTaskExecutor-1] DEBUG org.springframework.batch.core.step.tasklet.TaskletStep - Rollback for Error: java.lang.OutOfMemoryError: Java heap space 185000 [jobLauncherTaskExecutor-1] DEBUG org.springframework.transaction.support.TransactionTemplate - Initiating transaction rollback on application exception java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.log4j.helpers.PatternParser$LiteralPatternConverter.format(PatternParser.java:419) at org.apache.log4j.PatternLayout.format(PatternLayout.java:506) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.java:601) at org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133) at org.apache.http.impl.conn.Wire.wire(Wire.java:77) at org.apache.http.impl.conn.Wire.output(Wire.java:107) at org.apache.http.impl.conn.LoggingSessionOutputBuffer.write(LoggingSessionOutputBuffer.java:76) at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:119) at org.apache.http.entity.ByteArrayEntity.writeTo(ByteArrayEntity.java:115) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:227) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:712) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)

So far, I have tried using the following versions of code for doing the file to byte[] conversion:

Version 1: Core Java

    File file = new File(fileName);
    FileInputStream fin = null;
    byte fileContent[] = null;

    try {
        fin = new FileInputStream(file);

        fileContent = new byte[(int) file.length()];

        fin.read(fileContent);

    } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
    } catch (IOException ioe) {
        System.out.println("Exception while reading file " + ioe);
    } finally {
        try {
            if (fin != null) {
                fin.close();
            }
        } catch (IOException ioe) {
            System.out.println("Error while closing stream: " + ioe);
        }
    }

    return fileContent;

Version 2: Java 7 NIO

    Path path = Paths.get(fileName);

    byte[] data = null;

    try {
        data = Files.readAllBytes(path);
    } catch (IOException e) {
        e.printStackTrace();
    }

    return data;

Version 3: Apache Commons IO

    File file = new File(fileName);
    FileInputStream fis = null;
    byte fileContent[] = null;

    try {
        fis = new FileInputStream(file);

        fileContent = IOUtils.toByteArray(fis);

    } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
    } catch (IOException ioe) {
        System.out.println("Exception while reading file " + ioe);
    } finally {
        try {
            if (fis != null) {
                fis.close();
            }
        } catch (IOException ioe) {
            System.out.println("Error while closing stream: " + ioe);
        }
    }

    return fileContent;

Version 4: Google Guava

    File file = new File(fileName);
    FileInputStream fis = null;
    byte fileContent[] = null;

    try {
        fis = new FileInputStream(file);

        fileContent = ByteStreams.toByteArray(fis);

    } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
    } catch (IOException ioe) {
        System.out.println("Exception while reading file " + ioe);
    } finally {
        try {
            if (fis != null) {
                fis.close();
            }
        } catch (IOException ioe) {
            System.out.println("Error while closing stream: " + ioe);
        }
    }

    return fileContent;

Version 5: Apache.commons.io.FileUtils

File file = new File(fileName);

byte fileContent[] = null;

try {

    fileContent =  org.apache.commons.io.FileUtils.readFileToByteArray(file);

} catch (FileNotFoundException e) {
    System.out.println("File not found" + e);
} catch (IOException ioe) {
    System.out.println("Exception while reading file " + ioe);
}

return fileContent;

I have even setup my Heap Space settings to be quite big. It’s about 6 GB (5,617,772 K) for my external Tomcat as shown in the memory consumption in the Task Manager.

For the first three versions of code the heap space increases suddenly to more than 5 GB upon hitting this byte[] generation code and then it fails. With Google Guava, it seemed very promising and the memory consumption stayed to about 3.5 GB for quite some time, like about 10 minutes, after hitting the byte[] generation code and then it too suddenly jumped to more than 5 GB and failed.

I am unable to figure out a solution for this problem. Can somebody help me solve this problem? Any help in this would be greatly appreciated.

8
  • 2
    How to resolve? Don't read the entire file at once. Commented Aug 19, 2014 at 23:10
  • 7
    But note that the above failure didn't occur while reading the file. Rather, you ran out of storage logging, perhaps because you attempted to log the entire file at once. Commented Aug 19, 2014 at 23:12
  • It doesn't matter how you read it, if the file is too big it won't fit into memory. You don't need to do this. Files can be processed a record at a time. Commented Aug 19, 2014 at 23:19
  • Whatever big your heap is, there's always a file bigger than this. Read csv files line by line. There are a lot of classes out there, at least opencsv Commented Aug 19, 2014 at 23:24
  • 3
    @MSR : the way you send the request is actually also relevant to this issue. Is it a direct call to the Apache HTTPClient API or not ? Is it done through a framework ? ... Because, sure, turning off logging would help (configure Log4J to disable debug logging of the org.apache.http.wire to do that), but what you really should be doing is switch to a streaming implementation of your HTTP request (use a ContentProducerand a custom EntityTemplate instead of a ByteArrayEntity). Commented Aug 20, 2014 at 0:27

1 Answer 1

1

A 300MB file will not consume 6GB of heap when loaded into a byte array. And looking closer at your stacktrace, it seems the loading part is completely fine. "The java.lang.OutOfMemoryError: Java heap space" is only thrown when you try to log something using Log4j.

The logging seems to originate from 3rd party code instead of your own, so you might not be able to change what is being logged, but you can definitely reduce the logging via Log4j configuration, try increasing the log level (to WARN, ERROR or FATAL) for org.apache.* and you should be good to go.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Ivo for guiding me so clearly about this problem. Others pointed to the logging issue too, but your suggestion helped me identify the exact issue. So, it turns out that any of the code versions above could be used for this purpose and don't cause the Java Heap Space issue. It's the Apache logging which leads to this problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.