0

I try to fetch some data from mongodb , but my k8s pods hitting:

Terminating due to java.lang.OutOfMemoryError: Java heap space

Checking the heap dump this seems is causing some trouble:

try (CloseableIterator<A> iter = 
         mongoTemplate.stream(query(criteria),
                              DocumentAnnotation.class,
                              ANNOTATIONS_COLLECTION_NAME)) {
    return StreamSupport.stream(
        Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED), false)
                        .filter(annotation -> isAnnotationAcceptedByFilter(annotation))
                        .collect(Collectors.toList());
}

In general, it creates an iterator using Mongo driver streaming API and iterates through all annotations returned by a database using given criteria. It seems that Mongo DB driver is reading annotations in bulks of 47427 items (? at least I see that in heap dump) and despite of the fact that most will be filtered by the filter in Java so not returned to the client, that is causing a problem because each such request allocates 100MB of RAM to keep this bulk.

Does anybody know if that bulk size is configurable?

Thanks

9
  • I think you may have misdiagnosed this. A block size of that size shouldn't be problematic. I suspect that the real problem is either your filter is NOT filtering out most of the items (so the resulting list is too big) OR there is a memory leak somewhere else. Commented Jun 18, 2021 at 0:39
  • But this Q&A is about setting the batch size: stackoverflow.com/questions/48072977 Commented Jun 18, 2021 at 0:42
  • It seems this partially explain the issue: stackoverflow.com/questions/15516462/… , but still not clear how to fix it ... Commented Jun 18, 2021 at 22:33
  • I don't see how it is relevant. Commented Jun 19, 2021 at 1:50
  • 1
    Well ... then ... that is your problem. If the filtered list requires 500MB to store, then you need that much memory. Or you need to change your application design / logic so that you don't need to create the list at all. (This is nothing to do with the batch sizes used by the driver.) Commented Jun 19, 2021 at 5:26

1 Answer 1

1

Based on what you have said in the comments, my opinion is that what you have misdiagnosed the problem. The batch size (or "bulk size" as you called it) is not the problem, and changing the internal batch size for the Mongo driver won't fix the problem. The real problem is that even after filtering it the list you are creating using the stream is too large for the Java heap size that you are using.

There are two possible approaches to solving this:

  • Instead of putting the annotations into a List, iterate the stream and process the annotations as you get them.

  • Figure out a way to extract the annotations in batches. Then get a separate list of the annotations in each batch.

(In other circumstances, I would suggest trying to do the filtering in the MongoDB query itself. But that won't help to solve your OOME problem.)

But if you need all of the annotations in memory at the same time in order to process them, then your only practical option will be to get more memory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.