0

I'm writing a Python Selenium scraper for a web page that uses infinite scrolling to load content dynamically. Over time, as more posts are loaded, the JavaScript heap memory usage in ChromeDriver grows steadily until it eventually crashes with an Out of Memory error.

I tried removing old DOM elements using JavaScript after a certain number of scrolls, but it seems that memory is not actually being freed, or not enough to prevent a crash.

def clear_old_posts(driver):
removed_counts = driver.execute_script("""
    let removed = { articles: 0, ads: 0, images: 0, videos: 0, scripts: 0, canvas: 0 };

    let articles = document.querySelectorAll('.infinite-scroll-component > div[data-testid^="message-"]');
    let ads = document.querySelectorAll('.infinite-scroll-component .py-2');
    let images = document.querySelectorAll('img');
    let videos = document.querySelectorAll('video, iframe');
    let scripts = document.querySelectorAll('script');
    let canvasElements = document.querySelectorAll('canvas');


    for (let i = 0; i < articles.length; i++) {
        articles[i].remove();
        removed.articles++;
    }

    images.forEach(img => { img.remove(); removed.images++; });
    videos.forEach(video => { video.remove(); removed.videos++; });
    scripts.forEach(script => { script.remove(); removed.scripts++; });
    ads.forEach(ad => { ad.remove(); removed.ads++; });
    canvasElements.forEach(canvas => { canvas.remove(); removed.canvas++; });

    window.stop();
    if (window.gc) window.gc();  // Not available unless Chrome launched with --js-flags="--expose-gc"

    return removed;
""")
return removed_counts

Refreshing or restarting the ChromeDriver is not an option, as that resets the scroll state and loses progress. I need to maintain the session to keep loading more posts.

Here are my questions:

  1. Why isn't memory freed after removing DOM elements? Is ChromeDriver retaining references somehow?
  2. Is there a way to force garbage collection or clear memory more aggressively from within a Selenium session?
  3. Is there any workaround that doesn't involve restarting the driver or refreshing the page?

P.S:: the website is stocktwits.com for a specific symbol.

3
  • Not that I know of. Have you tried adding some filters to limit the number of posts and then looping through the filters? Commented Apr 4 at 22:00
  • @JeffC No the only filter that I want to apply is finding all posts regarding a specific symbol up to specific date. however, due to memory limit, the web driver gets crashed before reaching to the date. so I am looking for an iterative method to scroll for finding new posts, scraping them and finally remove them to free up the memory Commented Apr 5 at 12:31
  • If the site infinitely appends to the DOM you should write it up as a bug. I would not remove anything from the DOM or use window.stop, or try to manually garbage collect. The browser should do collections when needed. If the site infinitely appends to the DOM it's a problem as I'm not sure it can garbage collect anything that's still being referenced. Commented Apr 7 at 19:57

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.