7

I am trying to scrape the number of reviews of a place from google maps using python. For example the restaurant Pike's Landing (see google maps URL below) has 162 reviews. I want to pull this number in python.

URL: https://www.google.com/maps?cid=15423079754231040967

I am not vert well versed with HTML, but from some basic examples on the internet I wrote the following code, but what I get is a black variable after running this code. If you could let me know what am I dong wrong in this that would be much appreciated.

from urllib.request import urlopen
from bs4 import BeautifulSoup

quote_page ='https://www.google.com/maps?cid=15423079754231040967'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find_all('button',attrs={'class':'widget-pane-link'})
print(price_box.text)
5
  • 3
    scraping full map data is really hard. Why not try using an API instead? Commented Dec 29, 2017 at 12:26
  • I am not trying to scrape a full map, just a specific number on the pane that is on the lest side of the map. Also the google maps api does not returns the number of reviews as of now. Commented Dec 29, 2017 at 12:33
  • it can be added by JavaScript and urllib+BeautifulSoup can't run JavaScript. You may use Selenium to control web browser which will load page and run JavaScript. Or you can try to find this info in some JavaScript code - directly in HTML or in external files *.js. JavaScript can also uses AJAX/XHR to load data from different url and you can try to use DevTool in Chrome/Firefox to find this url. Mostly XHR gets data as JSON string which you can easily convert to python dictionary using module json Commented Dec 29, 2017 at 13:22
  • BTW: Google uses JavaScript to add elements on page but if Google sees that client doesn't use JavaScript then it can send page which doesn't need JavaScript but then elements mostly are in different tags with different classes. So you can turn off JavaScript in browser and load map again to see what BeautifulSoup gets from Google. Or you can save in file data from urlopen() and open this file in web browser or text editor. Commented Dec 29, 2017 at 13:26
  • I am not at very familiar with selenium or Java script, But I can definitly look into that. Also wanted to conform if you are suggesting that I can scrape google maps using the simple approach I used ? I was hoping to make minor changes to the code snippet I posted above to accomplish my goal. Commented Dec 29, 2017 at 13:39

3 Answers 3

5

It's hard to do it in pure Python and without an API, here's what I ended with (note that I added &hl=en at the end of the url, to get English results and not in my language):

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

Prints:

http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1
469 reviews
http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1
42 reviews
Sign up to request clarification or add additional context in comments.

Comments

0

You need to view the source code of the page and parse window.APP_INITIALIZATION_STATE variable block using a regular expression, there you'll find all needed data.


Alternatively, you can use Google Maps Reviews API from SerpApi.

Example JSON output:

"place_results": {
  "title": "Pike's Landing",
  "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
  "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7",
  "gps_coordinates": {
    "latitude": 64.8299557,
    "longitude": -147.8488774
  },
  "place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place",
  "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
  "rating": 3.9,
  "reviews": 839,
  "price": "$$",
  "type": [
    "American restaurant"
  ],
  "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
  "service_options": {
    "dine_in": true,
    "curbside_pickup": true,
    "delivery": false
  }
}

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google_maps",
    "type": "search",
    "q": "pike's landing",
    "ll": "@40.7455096,-74.0083012,14z",
    "google_domain": "google.com",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

reviews = results["place_results"]["reviews"]

print(reviews)

Output:

839

Disclaimer, I work for SerpApi.

2 Comments

That's not python only its paid API, expect result with python or related libs.
What do you mean it's not Python? from X import X should tell you right away that it's a Python script. If you were talking about SerpApi, the above code example uses Python package also.
-4

Scraping Google Maps without a browser or proxies will lead to blocking after a few successful requests. Therefore, the main problem of scraping Google is dealing with cookies and ReCaptcha.

This is a good post where you can see an example of using selenium in python for the same purpose. The general idea you start a browser and simulate what a user does on the website.

Another way will be using some reliable 3rd party service that will do all job for you and return you the results. For example, you can try Outscraper's Reviews service with a free tier.

from outscraper import ApiClient

api_client = ApiClient(api_key='SECRET_API_KEY')

# Get reviews of the specific place by id
result = api_client.google_maps_reviews('ChIJrc9T9fpYwokRdvjYRHT8nI4', reviewsLimit=20, language='en')

# Get reviews for places found by search query
result = api_client.google_maps_reviews('Memphis Seoul brooklyn usa', reviewsLimit=20, limit=500, language='en')

# Get only new reviews during last 24 hours
from datetime import datetime, timedelta
yesterday_timestamp = int((datetime.now() - timedelta(1)).timestamp())

result = api_client.google_maps_reviews(
    'ChIJrc9T9fpYwokRdvjYRHT8nI4', sort='newest', cutoff=yesterday_timestamp, reviewsLimit=100, language='en')

Disclaimer, I work for Outscraper.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.