I'm writing a Python Selenium scraper for a web page that uses infinite scrolling to load content dynamically. Over time, as more posts are loaded, the JavaScript heap memory usage in ChromeDriver grows steadily until it eventually crashes with an Out of Memory error.
I tried removing old DOM elements using JavaScript after a certain number of scrolls, but it seems that memory is not actually being freed, or not enough to prevent a crash.
def clear_old_posts(driver):
removed_counts = driver.execute_script("""
let removed = { articles: 0, ads: 0, images: 0, videos: 0, scripts: 0, canvas: 0 };
let articles = document.querySelectorAll('.infinite-scroll-component > div[data-testid^="message-"]');
let ads = document.querySelectorAll('.infinite-scroll-component .py-2');
let images = document.querySelectorAll('img');
let videos = document.querySelectorAll('video, iframe');
let scripts = document.querySelectorAll('script');
let canvasElements = document.querySelectorAll('canvas');
for (let i = 0; i < articles.length; i++) {
articles[i].remove();
removed.articles++;
}
images.forEach(img => { img.remove(); removed.images++; });
videos.forEach(video => { video.remove(); removed.videos++; });
scripts.forEach(script => { script.remove(); removed.scripts++; });
ads.forEach(ad => { ad.remove(); removed.ads++; });
canvasElements.forEach(canvas => { canvas.remove(); removed.canvas++; });
window.stop();
if (window.gc) window.gc(); // Not available unless Chrome launched with --js-flags="--expose-gc"
return removed;
""")
return removed_counts
Refreshing or restarting the ChromeDriver is not an option, as that resets the scroll state and loses progress. I need to maintain the session to keep loading more posts.
Here are my questions:
- Why isn't memory freed after removing DOM elements? Is ChromeDriver retaining references somehow?
- Is there a way to force garbage collection or clear memory more aggressively from within a Selenium session?
- Is there any workaround that doesn't involve restarting the driver or refreshing the page?
P.S:: the website is stocktwits.com for a specific symbol.