1

Struggling to write 300k rows to csv file through Apache POI java. I have been trying to generate a csv file from an excel file with 300k rows. Everytime, I get GCOutMemory error when it tries to write to output csv file. I even tried splitting the write for every 100k rows. The output file size keeps on growing but I don't see system.println statement isnt getting printed.

import javafx.beans.binding.StringBinding;
import org.apache.poi.hssf.record.crypto.Biff8EncryptionKey;
import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryType;
import java.math.BigDecimal;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.sql.Timestamp;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ReadWrite {
    private static Logger logger= LoggerFactory.getLogger(ReadWrite.class);

    public static void main(String[] args) {
        try {
            long startReading = System.currentTimeMillis();
            Path path = Paths.get("/Users/venkatesh/Documents/Citiout_files/citiout300k_2sheets.xlsx");

            byte[] result = new byte[0];
            try {
                result = Files.readAllBytes(path);
            } catch (IOException e) {
                e.printStackTrace();
            }
            InputStream is = new ByteArrayInputStream(result);

            Workbook workbook = WorkbookFactory.create(is);

            long readDone = System.currentTimeMillis() - startReading;
            logger.info("read time " + readDone);



            Sheet sheet = workbook.getSheetAt(1);
            Row firstRow = sheet.getRow(0);
            int headcol = firstRow.getLastCellNum();
            long startTransform = System.currentTimeMillis();
            firstRow.createCell(headcol++).setCellValue("Sold Amount1");
            firstRow.createCell(headcol++).setCellValue("CF_Quantity1");
            firstRow.createCell(headcol++).setCellValue("CF_Quantity2");
            firstRow.createCell(headcol++).setCellValue("CF_TradePrice");
            firstRow.createCell(headcol++).setCellValue("CF_ForwardPrice");
            firstRow.createCell(headcol++).setCellValue("CF_UnrealizedPL");
            firstRow.createCell(headcol++).setCellValue("CF_Quantity1Round");
            firstRow.createCell(headcol++).setCellValue("CF_Quantity2Round");
            firstRow.createCell(headcol++).setCellValue("CF_FXLotKeyNoTradeDate");
            firstRow.createCell(headcol++).setCellValue("CF_FXRoundedKeyNoTradeDate");
            firstRow.createCell(headcol++).setCellValue("CF_SettlementDate");
            for (int i = 1; i <=sheet.getLastRowNum()+1; i++) {
                String jj="";
                Row nRow = sheet.getRow(i-1);
                for(Cell c:nRow) {
                    if (c.getColumnIndex()==3 && i!=1) {
                        Calendar cal = Calendar.getInstance();
                        Date date1 = new SimpleDateFormat("dd-MMM-yyyy").parse(c.getStringCellValue());
                        cal.setTime(date1);
                        jj = String.valueOf(cal.get(Calendar.MONTH)+1) + "/" + String.valueOf(cal.get(Calendar.DAY_OF_MONTH)) + "/" + String.valueOf(cal.get(Calendar.YEAR));
                    }
                }
                int count = nRow.getLastCellNum();
                //System.out.println(nRow.getCell(3).getClass());
                nRow.createCell(count++).setCellFormula("G" + i + "*-1");
                nRow.createCell(count++).setCellFormula("E" + i + "/" + "G" + i);
                nRow.createCell(count++).setCellFormula("G" + i + "/E" + i);
                nRow.createCell(count++).setCellFormula("ROUND(ABS(T" + i + "/S" + i + "),6)");
                nRow.createCell(count++).setCellFormula("ROUND(K" + i + ",6)");
                nRow.createCell(count++).setCellFormula("ROUND(N" + i + ",2)");
                nRow.createCell(count++).setCellFormula("ROUND(S" + i + ",0)");
                nRow.createCell(count++).setCellFormula("ROUND(T" + i + ",0)");
                nRow.createCell(count++).setCellFormula("CONCATENATE(T" + i + "," + "\"~\"" + ",S" + i + ")");
                nRow.createCell(count++).setCellFormula("CONCATENATE(X" + i + "," + "\"~\"" + ",Y" + i + ")");
                nRow.createCell(count++).setCellValue(jj);
                c.setCellValue(DateUtil.getExcelDate(calendar.getTime()));

            }
            long endTransform = System.currentTimeMillis() - startTransform;
            System.out.println("Transformations time " + endTransform);
            final FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
            FileWriter writer= new FileWriter(new enter code hereFile("/Users/venkatesh/Documents/cit300k.csv"));
            StringBuilder data = new StringBuilder();
            Iterator<Row> rowIterator = workbook.getSheetAt(1).iterator();

            try {
                while (rowIterator.hasNext()) {

                    Row row = rowIterator.next();


                    Iterator<Cell> cellIterator = row.cellIterator();
                    while (cellIterator.hasNext()) {
                        Cell cell = cellIterator.next();

                        CellType type = cell.getCellType();
                        if (type == CellType.BOOLEAN) {
                            data.append(cell.getBooleanCellValue());
                        } else if (type == CellType.NUMERIC) {
                            data.append(cell.getNumericCellValue());

                        } else if (type == CellType.STRING) {
                            data.append(cell.getStringCellValue());
                        } else if (type == CellType.FORMULA) {
                            switch (evaluator.evaluateFormulaCell(cell)) {
                                case STRING:
                                    data.append(cell.getStringCellValue());
                                    break;
                                case NUMERIC:
                                    data.append(cell.getNumericCellValue());
                                    break;
                            }
                        } else if (type == CellType.BLANK) {
                        } else {
                            data.append(cell + "");
                        }
                        data.append(",");
                    }
                    writer.append(data.toString());
                    writer.append('\n');
                }
            } catch(Exception e){
                e.printStackTrace();
            }
            finally{
                if(writer!=null){
                    writer.flush();
                    writer.close();
                }
            }

            for (MemoryPoolMXBean mpBean: ManagementFactory.getMemoryPoolMXBeans()) {
                if (mpBean.getType() == MemoryType.HEAP) {
                    System.out.printf(
                            "Name: %s: %s\n",
                            mpBean.getName(), mpBean.getUsage()
                    );
                }
            }
            try {
                workbook.close();
                is.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        catch (Exception e){
            e.printStackTrace();
        }
    }
}


20-01-12 19:52:49:267  INFO main ReadWrite:64 - read time 11354
Transformations time 38659
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.TreeMap$Values.iterator(TreeMap.java:1031)
    at org.apache.poi.xssf.usermodel.XSSFRow.cellIterator(XSSFRow.java:117)
    at org.apache.poi.xssf.usermodel.XSSFRow.iterator(XSSFRow.java:132)
    at org.apache.poi.xssf.usermodel.XSSFEvaluationSheet.getCell(XSSFEvaluationSheet.java:86)
    at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:402)
    at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:275)
    at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:216)
    at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:56)
    at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluateFormulaCell(BaseFormulaEvaluator.java:185)
    at ReadWrite.main(ReadWrite.java:150)

7
  • 2
    Please show us the complete stacktrace for the OutOfMemoryError Commented Jan 11, 2020 at 23:59
  • If you want to write a CSV file, why not just write it as a CSV? Why use Apache POI if you don't actually need a full XLS spreadsheet? Commented Jan 12, 2020 at 9:17
  • @StephenC I had attached the stacktrace of the error.. please look into it and let me know.. Thanks very much for your reply.. Commented Jan 13, 2020 at 0:37
  • @Gagravarr I need a full XLSX spreadsheet.. I am trying to write those 300k rows after manipulating them with some data.. The new data contains extra columns compared to the file I used for reading.. So I needed POI to do some formula evaluations.. Commented Jan 13, 2020 at 0:40
  • 1
    1) Do not post error messages, stacktraces, etc as screenshots. Use cut and paste to copy them into the Question. 2) The stacktrace doesn't match the code. It shows that you are executing statements that are commented out!! 3) The problem is not in writing the CSV. It is in evaluating the formulae. That is clear from the stacktrace. Commented Jan 13, 2020 at 0:43

1 Answer 1

2

So now that we have a usable stacktrace, it is clear that the problem is NOT happening while writing the CSV file. It is actually happening while you are evaluating a spreadsheet formula. My guess is that the formula is summing across all rows in a sheet ... or something like that.

This is a problem, and there is probably no simple solution.

Here's what the POI documentation says:

File sizes/Memory usage

  • There are some inherent limits in the Excel file formats. These are defined in class SpreadsheetVersion. As long as you have enough main-memory, you should be able to handle files up to these limits. For huge files using the default POI classes you will likely need a very large amount of memory.
    • There are ways to overcome the main-memory limitations if needed:
    • For writing very huge files, there is SXSSFWorkbook which allows to do a streaming write of data out to files (with certain limitations on what you can do as only parts of the file are held in memory).
    • For reading very huge files, take a look at the sample XLSX2CSV which shows how you can read a file in streaming fashion (again with some limitations on what information you can read out of the file, but there are ways to get at most of it if necessary).

You are clearly running into these memory limitations. Basically, POI is trying to load too much of the spreadsheet into memory ... while you are evaluating the spreadsheet formulae ... and you are filling the heap.

One solution would be to increase the Java heap size. Or if you are already using all available RAM for your heap, run the conversion on a machine with more RAM. A lot of standard PCs have 16GB RAM these days. Maybe it is time for a hardware upgrade? But I'm guessing you have already thought of this.

If increasing the heap size is not viable, then you will need to rewrite your application to use SXSSFWorkbook. Furthermore, you may need to replace your approach of using formula evaluation with doing the calculations in native Java in a way that is compatible with row-by-row streaming of the spreadsheet. (It will depend on what the formulae do.)

Look at the linked example from the POI documentation for ideas.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply and the documentation links.. I tried these, I can write it to csv file for large files upto 1 million but in my application, I am trying to manipulate my excel file and I am doing transformations by creating new cells with values.. SAX Parser does not have a way to create a new cell.. It just parses through rows and cells.. Is there any alternative way to perform all operations and write to CSV file for huge data files upto 1M rows?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.