-1

I’m using Python + Selenium + ChromeDriver to check a list of titles (from a CSV file) against an online library catalog. My script searches each title and tries to determine if a specific library has it.

The issue is that even when I can see in the browser that the title is available, my script still reports it as “not found.”

After inspecting the site, I realized:

The first results page only shows a summary like “1 library has this title”, without listing the libraries.

You have to click the title link to open a details page that contains a holdings table (<table id="dpCentralHoldingsDetails">) showing which libraries own the item.

My script doesn’t reliably navigate to this page or wait long enough for the holdings table to load (since it uses headless Chrome).

import csv
import time
import random
import urllib.parse
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

CSV_FILE = "/Users/hudaabbasi/Desktop/mel_catalog_checker/horizon_dvds.csv"
OUTPUT_FILE = "/Users/hudaabbasi/Desktop/mel_catalog_checker/not_found.csv"
CHROMEDRIVER_PATH = "/Users/hudaabbasi/Desktop/chromedriver"

options = Options()
options.add_argument("--headless=new")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--window-size=1920,1080")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--disable-dev-shm-usage")

service = Service(CHROMEDRIVER_PATH)
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 20)

with open(CSV_FILE, "r", encoding="utf-8", errors="ignore") as f:
    reader = csv.DictReader(f)
    rows = list(reader)

not_found = []

def check_melcat(title):
    """Search the catalog for a title and return True if found under a specific library."""
    try:
        query = urllib.parse.quote_plus(title)
        url = f"https://search.mel.org/iii/encore/search/C__S{query}__Orightresult__U?lang=eng&suite=gold"
        driver.get(url)

        # Wait for either results or no results
        wait.until(lambda d: d.find_elements(By.CSS_SELECTOR, "a.institutionCount, .noResultsText"))

        # Case 1: No results
        if driver.find_elements(By.CSS_SELECTOR, ".noResultsText"):
            return False

        # Case 2: Results exist — click first record
        record_link = driver.find_elements(By.CSS_SELECTOR, "a.institutionCount")
        if not record_link:
            return False

        href = record_link[0].get_attribute("href")
        driver.get(href)

        # Wait for the holdings table to appear
        wait.until(EC.presence_of_element_located((By.ID, "dpCentralHoldingsDetails")))

        # Extract all table cells
        tds = driver.find_elements(By.CSS_SELECTOR, "#dpCentralHoldingsDetails td")

        for td in tds:
            text = td.text.strip().lower()
            if "dearborn public library" in text and "heights" not in text:
                return True

        return False

    except Exception as e:
        print(f"⚠️ Error searching '{title}': {e}")
        with open("debug_lastpage.html", "w", encoding="utf-8") as dbg:
            dbg.write(driver.page_source)
        return False

for row in rows:
    title = row.get("Title") or row.get("title") or list(row.values())[0]
    title = title.strip() if title else ""
    barcode = row.get("Barcode", "")

    print(f"🔍 Checking '{title}' ...")

    found = False
    retries = 2
    for attempt in range(retries):
        found = check_melcat(title)
        if found or attempt == retries - 1:
            break
        print("⏳ Retrying...")
        time.sleep(3)

    if found:
        print(f"'{title}' is listed for the library.")
    else:
        print(f"'{title}' NOT found for the library.")
        not_found.append({"Title": title, "Barcode": barcode})

    # polite random delay
    time.sleep(random.uniform(2, 4))

if not_found:
    pd.DataFrame(not_found).to_csv(OUTPUT_FILE, index=False, encoding="utf-8")
    print(f"\n Done! Missing titles saved to '{OUTPUT_FILE}'.")
else:
    print("\n All items are listed!")

driver.quit()

What I tried and what I expected

I used WebDriverWait and time.sleep() after clicking the result link to wait for the holdings table to appear, but it still didn’t always load before Selenium tried to read it. I expected the script to find the table and detect the target library name, but instead it keeps returning “not found” for every title.

What I want to know

  • How can I make Selenium wait reliably until the holdings table (dpCentralHoldingsDetails) is fully loaded?
  • How can I ensure my script correctly checks whether a specific library name appears in that table?
6
  • 1
    There's no way to help without a concrete example of the search url. Questions must provide a minimal reproducible example. Commented Oct 11 at 14:02
  • always put full error message (traceback) because there are other useful information. Commented Oct 11 at 16:35
  • is your code working when you remove --headless? Do you see expected elements when you run without --headless? Maybe it doesn't have dpCentralHoldingsDetails at all. Commented Oct 11 at 16:37
  • 1
    you should add example data directly in code instead of path to file CSV because we can't run it to test problem. Commented Oct 11 at 17:12
  • maybe you should write in file driver.page_source to see what you really get from server (when you run headless). Maybe it sends Captcha or warning because it detected that you use script. Commented Oct 11 at 19:12

1 Answer 1

1

You didn't include the contents of the CSV so I couldn't test all the code but I found a few problems and fixed them:

  1. As of Selenium 4.6, Selenium Manager was added and it automatically takes care of downloading and configuring the appropriate driver for you. So, you no longer need to use a DriverManager or specify the path, etc.

  2. .noResultsText does not locate an element on the results or no results page. I'm not sure where that came from. That may be a source of some issues.

  3. The main thing I would suggest is never (generally) put all your code in a try-except. It eats all the exceptions and makes finding issues extremely difficult. If you have a reasonable expectation that a small block of code might throw a specific exception, then use it there and catch ONLY the specific exception. It's a bad practice to catch Exception.

  4. In Selenium terms, presence means the element exists in the DOM but does not guarantee that it's ready to be interacted with. Selenium was designed to act like a user which means that throws exceptions if you try to interact with an element that is not visible. If you are going to click an element, wait for clickable, EC.element_to_be_clickable. For other interactions, wait for visible, EC.visibility_of_element_located. This could be part of your issue with the holdings table.

  5. Instead of grabbing all table rows and then looking for the specific library in ALL rows, you can just use an XPath to only find the rows that contain the desired library.

NOTE: There is a shortcut I found... you can search by location, e.g. Dearborn MI. After you do your search, if there are results you can use the left panel "Refine by:" and Location. Drill down to Dearborn and then select that. That gives you a new URL you can use,

https://search.mel.org/iii/encore/search/C__S{query}__Ff%3Afacettopicplace%3ADearborn%3ADearborn%3ADearborn%3A%3A__Orightresult__U__X0?lang=eng&suite=gold

That should eliminate 99% of the bad results saving a ton of time but... I wasn't able to find a way to search for books only in a specific library. So, you still have to search for the library name in the results table. I put the Dearborn specific URL into my code as url2, in case you wanted to use it.

NOTE: I didn't update the CSV related and other code so I could get it working on my machine. I modified my code to have two searches hard coded... one that returns results, "python", and one that doesn't, "zaqw". You'll need to add the rest of your code back in.

Here's the updated working code.

from selenium import webdriver

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()

wait = WebDriverWait(driver, 5)
library_name = "Kalamazoo Public Library"
queries = [
    "zaqw",  # no results
    "python" # results
    ]

def check_melcat(title):
    """Search the catalog for a title and return True if found under a specific library."""
    
    url = f'https://search.mel.org/iii/encore/search/C__S{query}__Orightresult__U?lang=eng&suite=gold'
    # Dearborn specific search, if you want to use it
    url2 = f'https://search.mel.org/iii/encore/search/C__S{query}__Ff%3Afacettopicplace%3ADearborn%3ADearborn%3ADearborn%3A%3A__Orightresult__U__X0?lang=eng&suite=gold'
    driver.get(url)

    # wait for results element to be visible
    results = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.encore-search-result-summary")))

    # Case 1: No results, exit early
    if "No catalog results found for" in results.text:
        return False

    # Case 2: Results exist - click first record
    driver.find_element(By.CSS_SELECTOR, "a.institutionCount").click()

    # Wait for the holdings table to appear and extract all table cells
    try:
        wait.until(EC.visibility_of_element_located((By.XPATH, f"//table[@id='dpCentralHoldingsDetails']//td[contains(.,'{library_name}')]")))
        return True
    except TimeoutException:
        return False

for query in queries:
    print(query, check_melcat(query))

Output

zaqw False
python True
Sign up to request clarification or add additional context in comments.

1 Comment

Dynamic loads are a little bit of pain. To avoid this wait stuff for specific elements I use a more general approach: wait for any jQuery script and if this does not help, wait for HTML to be stable. Code in here github.com/dornech/utils-seleniumxp/blob/main/src/… , see wait4HTMLstable

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.