OutOfMemoryError while streaming 20M+ rows in Spring Boot with Hibernate Issue I’m streaming 20M+ rows from my Content table using a Stream<Object[]> in Spring Boot with Hibernate. However, I encounter java.lang.OutOfMemoryError: Java heap space, even though I expect each row to be garbage-collected after processing.
Error Logs (Partial)
Mar 09 14:13:51 : dev-pedagogy-learner-progress Exception in thread "http-nio-8080-Poller" java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress Exception in thread "Catalina-utility-1" java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.685Z INFO 1 --- [nio-8080-exec-2] o.s.core.annotation.MergedAnnotation : Failed to introspect annotations on java.lang.Object org.springframework.boot.actuate.endpoint.web.servlet.AbstractWebMvcEndpointHandlerMapping$OperationHandler.handle(jakarta.servlet.http.HttpServletRequest,java.util.Map): java.lang.OutOfMemoryError: Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.767Z ERROR 1 --- [ task-2] o.h.engine.jdbc.spi.SqlExceptionHelper : Java heap space Mar 09 14:13:51 : dev-pedagogy-learner-progress 2025-03-09T08:43:51.767Z WARN 1 --- [ task-2] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: S1000
Code for Streaming Data
I am using a Hibernate Stream<Object[]> query with a fetch size of 1000. After processing a row, I want it to be removed from memory, but it seems like memory consumption keeps increasing.
try (Stream<Object[]> stream = contentRepository.streamContentProgress(allChapterIds, alluserIds)) {
stream.forEachOrdered(row -> {
processRow(row, chapterToContentSet, contentToBatchProgressMap, programBatchToUserIdsMap);
row = null; // Trying to free memory
});
}
Code for Processing Each Row
private void processRow(Object[] row,
Map<Long, Set<Long>> chapterToContentSet,
Map<Long, Map<Integer, Long>> contentToBatchProgressMap,
Map<Integer, List<Long>> programBatchToUserIdsMap) {
Long chapterId = ((Number) row[0]).longValue();
Long contentId = ((Number) row[1]).longValue();
Long userId = ((Number) row[2]).longValue();
Long progress = ((Number) row[3]).longValue();
chapterToContentSet.computeIfAbsent(chapterId, k -> new HashSet<>()).add(contentId);
for (Map.Entry<Integer, List<Long>> entry : programBatchToUserIdsMap.entrySet()) {
Integer programBatchId = entry.getKey();
List<Long> batchUserIds = entry.getValue();
if (batchUserIds.contains(userId)) {
contentToBatchProgressMap
.computeIfAbsent(contentId, k -> new HashMap<>())
.merge(programBatchId, progress, Long::sum);
}
}
// Attempting to free memory
Arrays.fill(row, null);
row = null;
chapterId = null;
contentId = null;
userId = null;
progress = null;
// System.gc(); // (Not recommended, just an experiment)
}
Repository Query (Streaming)
@QueryHints({
@QueryHint(name = "org.hibernate.fetchSize", value = "1000"),
@QueryHint(name = "org.hibernate.cacheMode", value = "IGNORE")
})
@Query("SELECT c.chapterId, c.contentId, c.userId, c.progress FROM Content c " +
"WHERE c.chapterId IN :chapterIds AND c.userId IN :userIds")
Stream<Object[]> streamContentProgress(@Param("chapterIds") List<Long> chapterIds,
@Param("userIds") List<Long> userIds);
Entity Class
@Setter
@Getter
@RequiredArgsConstructor
@Entity
@SuperBuilder(toBuilder = true)
public class Content extends BaseEntity {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name = "content_id", nullable = false)
private Long contentId;
@Column(name = "user_id", nullable = false)
private Long userId;
@Column(name = "progress", nullable = false)
private Integer progress;
@Column(name = "chapter_id", nullable = false)
private Long chapterId;
}
What I Have Tried
Setting fetch size (1000) and disabling caching using @QueryHint
→ Still seeing memory growth Explicitly setting row = null and calling Arrays.fill(row, null)
→ No improvement Calling System.gc() (not recommended but tried for debugging)
→ No effect Checked for Hibernate first-level caching (cacheMode = IGNORE) → No caching issue
Questions
Why is memory not being freed up even though I’m using a stream and not collecting data?
Does Stream<Object[]> still retain previous results in memory?
Should I explicitly detach entities or manually clear the Hibernate session?
Is there a better way to process large datasets efficiently without hitting OutOfMemoryError?
Any help would be appreciated!
processRow, and don't preform anything else. 2. try to enable heap dump, and check what object is using majority of heap space.clearon theentityManagerafter each fetch your memory will fill up.