0

OutOfMemoryError while streaming 20M+ rows in Spring Boot with Hibernate Issue I’m streaming 20M+ rows from my Content table using a Stream<Object[]> in Spring Boot with Hibernate. However, I encounter java.lang.OutOfMemoryError: Java heap space, even though I expect each row to be garbage-collected after processing.

Error Logs (Partial)

Mar 09 14:13:51 : dev-pedagogy-learner-progress Exception in thread "http-nio-8080-Poller" java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress Exception in thread "Catalina-utility-1" java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.685Z INFO 1 --- [nio-8080-exec-2] o.s.core.annotation.MergedAnnotation : Failed to introspect annotations on java.lang.Object org.springframework.boot.actuate.endpoint.web.servlet.AbstractWebMvcEndpointHandlerMapping$OperationHandler.handle(jakarta.servlet.http.HttpServletRequest,java.util.Map): java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.767Z ERROR 1 --- [ task-2] o.h.engine.jdbc.spi.SqlExceptionHelper : Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.767Z WARN 1 --- [ task-2] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: S1000

Code for Streaming Data

I am using a Hibernate Stream<Object[]> query with a fetch size of 1000. After processing a row, I want it to be removed from memory, but it seems like memory consumption keeps increasing.

try (Stream<Object[]> stream = contentRepository.streamContentProgress(allChapterIds, alluserIds)) {
    stream.forEachOrdered(row -> {
        processRow(row, chapterToContentSet, contentToBatchProgressMap, programBatchToUserIdsMap);
        row = null; // Trying to free memory
    });
}

Code for Processing Each Row

private void processRow(Object[] row,
                        Map<Long, Set<Long>> chapterToContentSet,
                        Map<Long, Map<Integer, Long>> contentToBatchProgressMap,
                        Map<Integer, List<Long>> programBatchToUserIdsMap) {
    Long chapterId = ((Number) row[0]).longValue();
    Long contentId = ((Number) row[1]).longValue();
    Long userId = ((Number) row[2]).longValue();
    Long progress = ((Number) row[3]).longValue();

    chapterToContentSet.computeIfAbsent(chapterId, k -> new HashSet<>()).add(contentId);

    for (Map.Entry<Integer, List<Long>> entry : programBatchToUserIdsMap.entrySet()) {
        Integer programBatchId = entry.getKey();
        List<Long> batchUserIds = entry.getValue();

        if (batchUserIds.contains(userId)) {
            contentToBatchProgressMap
                    .computeIfAbsent(contentId, k -> new HashMap<>())
                    .merge(programBatchId, progress, Long::sum);
        }
    }

    // Attempting to free memory
    Arrays.fill(row, null);
    row = null;
    chapterId = null;
    contentId = null;
    userId = null;
    progress = null;
    // System.gc(); // (Not recommended, just an experiment)
}

Repository Query (Streaming)

@QueryHints({
    @QueryHint(name = "org.hibernate.fetchSize", value = "1000"),
    @QueryHint(name = "org.hibernate.cacheMode", value = "IGNORE")
})
@Query("SELECT c.chapterId, c.contentId, c.userId, c.progress FROM Content c " +
       "WHERE c.chapterId IN :chapterIds AND c.userId IN :userIds")
Stream<Object[]> streamContentProgress(@Param("chapterIds") List<Long> chapterIds,
                                       @Param("userIds") List<Long> userIds);

Entity Class

@Setter
@Getter
@RequiredArgsConstructor
@Entity
@SuperBuilder(toBuilder = true)
public class Content extends BaseEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "content_id", nullable = false)
    private Long contentId;

    @Column(name = "user_id", nullable = false)
    private Long userId;

    @Column(name = "progress", nullable = false)
    private Integer progress;

    @Column(name = "chapter_id", nullable = false)
    private Long chapterId;
}

What I Have Tried

Setting fetch size (1000) and disabling caching using @QueryHint

→ Still seeing memory growth Explicitly setting row = null and calling Arrays.fill(row, null)

→ No improvement Calling System.gc() (not recommended but tried for debugging)

→ No effect Checked for Hibernate first-level caching (cacheMode = IGNORE) → No caching issue

Questions

Why is memory not being freed up even though I’m using a stream and not collecting data?

Does Stream<Object[]> still retain previous results in memory?

Should I explicitly detach entities or manually clear the Hibernate session?

Is there a better way to process large datasets efficiently without hitting OutOfMemoryError?

Any help would be appreciated!

2
  • 1. try just do a simple logging in processRow, and don't preform anything else. 2. try to enable heap dump, and check what object is using majority of heap space. Commented Mar 9 at 14:20
  • 1
    Streaming will still load everything in the 1st level cache eventually. So if you arent' evicting rows from the first level cache and call clear on the entityManager after each fetch your memory will fill up. Commented Mar 10 at 6:45

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.