1

The context is springerlink. For example this series of books GTM

I want to get the information located at the bottom of each book's webpage:

book info

All I want is the E-ISBN information on each page.

Is there's a way(not limited to selenium) that enumerate each book page and get the information?

3
  • Your question is too broad. Commented Jan 17, 2023 at 16:00
  • @Prophet I will edit this question to the scope of only get E-ISBN information, the download require authentication which is much more difficult Commented Jan 17, 2023 at 16:08
  • 1
    You should provide your code trials + waht exactly did not work, what errors you faced etc. Commented Jan 17, 2023 at 16:11

2 Answers 2

0

For this easy task you can use both Selenium and BeautifulSoup, but the latter is easier and faster so let's use it to get title and E-ISBN codes.

First install BeautifulSoup with the command pip install beautifulsoup4.

Method 1 (faster): get E-ISBN directly from books list

Notice that in the books list for each book there is an eBook link, which is something like https://www.springer.com/book/9783031256325 where 9783031256325 is the EISBN code without the - characters.

enter image description here

So we can get the EISBN codes directly from those urls, without the need to load a new page for each book:

import requests
from bs4 import BeautifulSoup

url = 'https://www.springer.com/series/136/books'
soup = BeautifulSoup(requests.get(url).text, "html.parser")
titles = [title.text.strip() for title in soup.select('.c-card__title')]
EISBN = []
for a in soup.select('ul:last-child .c-meta__item:last-child a'):
    c = a['href'].split('/')[-1] # a['href'] is something like https://www.springer.com/book/9783031256325
    EISBN.append( f'{c[:3]}-{c[3]}-{c[4:7]}-{c[7:12]}-{c[-1]}' ) # insert four '-' in the number 9783031256325 to create the E-ISBN code

for i in range(len(titles)):
    print(EISBN[i],titles[i])

Output

978-3-031-25632-5 Random Walks on Infinite Groups
978-3-031-19707-9 Drinfeld Modules
978-3-031-13379-4 Partial Differential Equations
978-3-031-00943-3 Stationary Processes and Discrete Parameter Markov Processes
978-3-031-14205-5 Measure Theory, Probability, and Stochastic Processes
978-3-030-56694-4 Quaternion Algebras
978-3-030-73839-6 Mathematical Logic
978-3-030-71250-1 Lessons in Enumerative Combinatorics
978-3-030-35118-2 Basic Representation Theory of Algebras
978-3-030-59242-4 Ergodic Dynamics

Method 2 (slower): get E-ISBN by loading a page for each book

This method load the details page for each book and extract from there the EISBN code:

soup = BeautifulSoup(requests.get(url).text, "html.parser")
books = soup.select('a[data-track-label^="article"]')
titles, EISBN = [], []

for book in books:
    titles.append(book.text.strip())
    soup_book = BeautifulSoup(requests.get(book['href']).text, "html.parser")
    EISBN.append( soup_book.select('p:has(span[data-test=electronic_isbn_publication_date]) .c-bibliographic-information__value')[0].text )

If you are wondering p:has(span[data-test=electronic_isbn_publication_date]) select the parent p of the span having attribute data-test=electronic_isbn_publication_date.

Sign up to request clarification or add additional context in comments.

Comments

0

You can open each book through it's link within the website in a seperate tab and after switching to the new tab you need to induce WebDriverWait for the visibility_of_element_located() and you can extract any of the desired info. As an example to extract the Hardcover ISBN you can use the following locator strategies:

  • Code Block:

    driver.get('https://www.springer.com/series/136/books')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-cc-action='accept']"))).click()
    hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-track='click'][data-track-label^='article'][href]")))]
    for href in hrefs:
        main_window = driver.current_window_handle
        driver.execute_script("window.open('" + href +"');")
        WebDriverWait(driver, 5).until((EC.number_of_windows_to_be(2)))
        windows_after = driver.window_handles
        new_window = [handle for handle in windows_after if handle != main_window][0]
        driver.switch_to.window(new_window)
        print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Hardcover ISBN']//following::span[@class='c-bibliographic-information__value']"))).text)
        driver.close()
        driver.switch_to.window(main_window)
    driver.quit()
    
  • Console Output:

    978-3-031-25631-8
    978-3-031-19706-2
    978-3-031-13378-7
    978-3-031-00941-9
    978-3-031-14204-8
    978-3-030-56692-0
    978-3-030-73838-9
    978-3-030-71249-5
    978-3-030-35117-5
    978-3-030-59241-7
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.