0

I am trying to extract the data from a Web page where the options in the dropdown lists are dynamically loaded based on our input. I am using Selenium Webdriver to extract the data from the dropdowns. Please see the screenshots below.

Dropdown 1 - State

Dropdown 2 - City

Dropdown 3 - Station

City Dropdown options are loaded once I select the state and Station dropdown is loaded after I select city.

So far I was able to get it working to extract the station names with this code.

citiesList = []
stationNameList = []
siteIdList = []

for city in cityOptions[1:]:
    citiesList.append(city.text)

stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')

 
      for ele in citiesList:
            cityDropdown.send_keys(ele, Keys.RETURN)
            time.sleep(2)
            stationDropDown.click()
            print(stationDropDown.text)

State Options

City Options

Option values from station dropdown

Can anyone please help me in extracting the siteId's for every state and city?

4
  • Website URL pls? Commented Oct 19, 2020 at 3:46
  • app.cpcbccr.com/AQI_India Commented Oct 19, 2020 at 10:38
  • There are only station site ids available on the web site. Do you want to scrap only that or something else also? Commented Oct 19, 2020 at 11:10
  • station site ID's and station name. Commented Oct 19, 2020 at 12:47

1 Answer 1

1

Try below approach using python - requests simple, straightforward, reliable, fast and less code is required when it comes to requests. I have fetched the API URL from website itself after inspecting the network section of google chrome browser.

What exactly below script is doing:

  1. First it will take the API URL and payload (very important to do a POST request) to do a POST request and get the data in return.
  2. After getting the data script will parse the JSON data using json.loads library.
  3. At last it will iterate all over the list of stations one by one and print the details like State name, City name, Station name and Station Site Id.

Network call tab enter image description here

Output of below code.

Output of python script

def scrape_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations'] 
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
    print('=' * 120)
    print('Scraping station data for state : ' + extracted_states[state]['stateID'])
    for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
        print('-' * 100)
        print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
        print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
        print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
        print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
        print('-' * 100)        
    print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
    print('=' * 120)

scrape_aqi_site_id()
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Vin. Appreciate your help. The script worked for me. I couldn't find the POST url app.cpcbccr.com/aqi_dashboard/aqi_station_all_india that you've used. I've searched for it in the network tab of app.cpcbccr.com/AQI_India for the requests.
For your reference i have added the image of my network tab. Also if that works for you please up vote and accept the answer. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.