1

I am getting a java outofmemory error. I added the necessary java arguments, but I still keep getting this error. I shared the libraries and functions that I use. The function It gives this error when converting a large csv file (about 15mb) into an xlsx file. It working properly on small files without any errors. How can i fix this error? Thanks.

I added these java args on Intellij Idea

I got error

I use this libraries

Main

public class Main {

    public static void main(String[] args) {

        convert_CSV_to_XLSX(S.CSV_PATH,S.XLSX_PATH,"Sheet");

    }

}

Convert CSV to XLSX

public void convert_CSV_to_XLSX(String inputFilePath, String outputFilePath, String sheetName) {
        try {
            ArrayList<ArrayList<Object>> csvObjectsAll = readCSV(inputFilePath);
            writeXLSX_horizontally(outputFilePath, csvObjectsAll, sheetName);
        } catch (Exception e) {
            e.printStackTrace();
        }
}

ReadCSV

public ArrayList<ArrayList<Object>> readCSV(String inputFilePath) {
        ArrayList<ArrayList<Object>> gal = new ArrayList<>();
        try {
            String csvStr = new String(Files.readAllBytes(Paths.get(inputFilePath)), StandardCharsets.UTF_8);
            for (String str : csvStr.split("\n")) {
                ArrayList<Object> csvLinesSplit = new ArrayList<>();
                String ss = str.replaceAll("\"", "");
                if (ss.charAt(ss.length() - 1) == ',') {
                    ss += "$";
                }
                for (String s : ss.split(",")) {
                    if (s.equals("") || s.equals("$")) {
                        csvLinesSplit.add("");
                    } else {
                        csvLinesSplit.add(s);
                    }
                }
                gal.add(csvLinesSplit);
            }
        } catch (Exception e) {

        }
        return gal;
}

Write XLSX

public void writeXLSX_horizontally(String outputFileName, ArrayList<ArrayList<Object>> gdl, String sheetName) {

        XSSFWorkbook workbook = new XSSFWorkbook();
        XSSFSheet sheet = workbook.createSheet(sheetName);

        int rowNum = 0;
        for (ArrayList<Object> objectArrList : gdl) {
            Row row = sheet.createRow(rowNum++);
            int cellNum = 0;
            for (Object obj : objectArrList) {
                Cell cell = row.createCell(cellNum++);
                boolean is_double = false, is_integer = false;
                try {
                    cell.setCellValue(Double.parseDouble(obj.toString()));
                    is_double = true;
                } catch (Exception e) {
                }
                if (!is_double) {
                    try {
                        cell.setCellValue(Integer.parseInt(obj.toString()));
                        is_integer = true;
                    } catch (Exception e) {

                    }
                }
                if (!is_double && !is_integer) {
                    if (obj == null) {
                        cell.setCellValue(new String());
                    } else {
                        cell.setCellValue(obj.toString());
                    }
                }
            }
        }
        try {
            FileOutputStream file = new FileOutputStream(outputFileName);
            workbook.write(file);
            file.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
}
8
  • 2
    Which line in your code is your error occurring at? Also, are you sure that you're setting the jvm arguments -Xmx2048M? I think even the default value should be pretty big, so I wouldn't expect a 15mb excel file to cause a problem. Commented May 13, 2021 at 11:27
  • Hi Jack, what is the name of the field you used for -Xmx param? You need to use VM options, not Program arguments Commented May 13, 2021 at 11:27
  • I replaced the reading operation in the readCSV function with BufferedReader. I put print (rowNum) in the for loop within the writeXLSX_horizontally function and noticed that after a while the loop slows down (probably memory sharing). I think the error is caused by the XSSFSheet object. In this case, the simplest solution would be to write each row object to the XSSFSheet object, then close the file, then add each row to the end of the file and do the same for each row. Commented May 13, 2021 at 11:32
  • new String() -> "" is a minor, but obvious improvement Commented May 13, 2021 at 11:35
  • I shared it in the first picture in the topic. I added argument over intellij idea. I just added "-Xms1024M -Xmx2048M". Do I need to add a different addition? Commented May 13, 2021 at 11:35

3 Answers 3

2

This line:

 String csvStr = new String(Files.readAllBytes(Paths.get(inputFilePath)), StandardCharsets.UTF_8);

Issue:

You are loading the whole file into the memory by using Files.readAllBytes. And the allocated memory for the jvm processor on which this program is running is not enough.

Possible Solution:

You may want to start reading the file using streams/buffers like BufferedReader. Or you can lookup other Readers that allow you to read the file in bits so the whole memory is not consumed all at once.

Further Modifications:

You will have to modify your program at the time of writing also where after you read bits of data, you process and and write to a file, and when the time comes to write to a file again, you append.

Sign up to request clarification or add additional context in comments.

2 Comments

Do you realize how small of an optimization this is? It's a 15mb file, so this is about a 15mb optimization. Also, they're talking about setting 1GB as a max.
Oh I didn't see that. GG
1

As discussed in comments, the problem was due to incorrect IntelliJ run configuration.

VM arguments need to be passed to a separate field in IntelliJ, not as "Program arguments".

Still, the program can be improved:

  • process input file line by line
  • use SXSSFWorkbook to write output

Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy. This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time.

  • use "" instead of new String()
  • not memory related: get generics right (you have strings in parsed CSV, not arbitrary objects)

Note that streaming both input and output is the best option. Having said that, 15MB input is tiny by todays standards, so I believe raising heap memory a bit is not a bad short-term solution

Comments

0

I removed shared memory on java virtual machine: -Xms1024M -Xmx12288M

Thanks to the @Faraz and @Lesiak perm solution write large xlsx file here:

Read CSV

public ArrayList<ArrayList<Object>> readCSV(String inputFilePath) {
        ArrayList<ArrayList<Object>> gal = new ArrayList<>();
        try {
            BufferedReader csvReader = new BufferedReader(new FileReader(inputFilePath));
            String row;
            int rowSize = 0;
            ArrayList<String> columnList = new ArrayList<>();
            while ((row = csvReader.readLine()) != null) {
                ArrayList<Object> rowCells = new ArrayList<>();
                if (rowSize == 0) {
                    if (row.charAt(row.length() - 1) == ',')
                        throw new Exception("CSV Format Error");
                    for (String columnName : row.split(",")) {
                        columnList.add(columnName);
                    }
                }
                int cellSize = 0;
                for (String cell : row.split(",")) {
                    if (cell.equals("")) {
                        rowCells.add(null);
                    } else {
                        rowCells.add(cell);
                    }
                    cellSize++;
                }
                if (cellSize != columnList.size()) {
                    for (int i = 0; i < columnList.size() - cellSize; i++) {
                        rowCells.add(null);
                    }
                }
                gal.add(rowCells);
                rowSize++;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return gal;
}

Write XLSX

public void writeXLSX_horizontally(String outputFileName, ArrayList<ArrayList<Object>> gdl, String sheetName) {
        SXSSFWorkbook workbook = new SXSSFWorkbook();
        SXSSFSheet sheet = workbook.createSheet(sheetName);
        int rowNum = 0;
        for (ArrayList<Object> objectArrList : gdl) {
            Row row = sheet.createRow(rowNum++);
            int cellNum = 0;
            for (Object obj : objectArrList) {
                Cell cell = row.createCell(cellNum++);
                boolean is_double = false, is_integer = false;
                try {
                    cell.setCellValue(Double.parseDouble(obj.toString()));
                    is_double = true;
                } catch (Exception e) { }
                if (!is_double)
                    try {
                        cell.setCellValue(Integer.parseInt(obj.toString()));
                        is_integer = true;
                    } catch (Exception e) { }
                if (!is_double && !is_integer)
                    if (obj == null)
                        cell.setCellValue(new String());
                    else
                        cell.setCellValue(obj.toString());
            }
        }
        try {
            FileOutputStream file = new FileOutputStream(outputFileName);
            workbook.write(file);
            file.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
}

Read large xlsx file solution here: How to read XLSX file of size >40MB

Other important library for read large xlsx file: https://github.com/monitorjbl/excel-streaming-reader

constraints: xlsx file line count must be between 0..1048575

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.