2

Update: The script provided by Jonas has solved most of the problems. Now, I am trying to find a way to use datepicker or sendkey to set the date range since it will automatically take one day every time when I re-run the code.

date_start = driver.find_element(By.Xpath, 'date_from')
date_end = driver.find_element(By.Xpath, 'date_to')
date_start.sendKeys("2021-09-24")
date_end.sendKeys("2021-10-01")

Original Problem: I am using Selenium WebDriver.Chrome to extract data from a table that can not be highlighted for copy and paste from the website and I found out that the data are under JavaScript's function when I tried to extract the data with BeautifulSoup. The HTML code for the Java table is like this:

<script>

  function initTableData() {
    window.initialAnalystData = [{"action_company":"Initiates Coverage On","action_pt":"Announces","analyst":"BTIG","analyst_name":"James Sullivan","currency":"USD","lastTradePrice":24.89},"logo":null}];
    window.initialAnalystDate = {"date_from":"2021-09-24","date_to":"2021-10-01"};

          window.initialAnalystTime = "11:27";
      }

  initTableData();

</script>

I am new to both Selenium and JavaScript, but I have tried the following code to get the data list and it is not working.

element = driver.findElement(By.tagName("script"));
htmlCode = driver.executeScript("return arguments[0].innerHTML;", element)

What should I try next? The website link is here.

Thanks!

2
  • if you're going to run javascript you could just get "window.initialAnalystData, etc...". You could also just get the info you need from the DOM. Commented Oct 1, 2021 at 18:23
  • Thanks for the suggestions! I will definitely try to run it later! Commented Oct 2, 2021 at 18:07

1 Answer 1

1

You could use regular expression to find the part and then work with it:

from selenium import webdriver
import time
import re

url = 'https://www.benzinga.com/analyst-ratings'
driver.get(url)
time.sleep(5) #Let it load all the data first

htmlSource = driver.page_source
raw_data = re.findall(r'window.initialAnalystData = .*;', htmlSource)[0][29:].split('{')[1:]


#clean data if you want (just one possible way out of many!):

cleaned_data = {}
for data in raw_data:
    clean_data = data.split(',')
    details_to_dic = {}
    for details in clean_data:
        details_temp = details.replace('"', '')
        details_temp = details_temp.split(':')
        try:
            details_to_dic[details_temp[0]] = details_temp[1]
        except:
            pass

    cleaned_data[details_to_dic['name']] = details_to_dic

So you have the data as a dictionary (example data of company APA):

print(cleaned_data['APA'])

output:

{'action_company': 'Downgrades', 'action_pt': 'Lowers', 'analyst': 'Citigroup', 'analyst_name': 'Scott Gruber', 'currency': 'USD', 'date': '2021-10-01', 'exchange': 'NASDAQ', 'id': '61573ba273a5f300019bb64a', 'importance': '0', 'name': 'APA', 'notes': '', 'pt_current': '23.0000', 'pt_prior': '27.0000', 'rating_current': 'Neutral', 'rating_prior': 'Buy', 'ticker': 'APA', 'time': '12', 'updated': '1633106928', 'url': 'https', 'url_calendar': 'https', 'url_news': 'https', 'quote': ''}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! It works like a charm! But now I am trying to select the date range by using send key and XPath. But it is difficult to locate since it is inside the window script.
You can locate it via xpath and then use driver.find_element_by_xpath('PATH HERE').click(). For example try this: driver.find_element_by_xpath('//*[@id="analyst-calendar"]/div/div[2]/div/div/div/div[2]/div[2]/div[1]/div[2]/div/div/div[1]/div[3]/span/span').click()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.