0

I am trying to find the api from where this site is getting its data but I can't find it.

Site Link: https://govservices.dcra.dc.gov/contractorratingsystem/BuildingProfessionals/BuildingProfessional?profType=General%20Contractor&profName=

In the network tab I can see the data in the xhr response. Data is changed each time the other page is selected but I have to extract the data but don't know how to do it. I don't know from where the website is getting its data. I am purely new to this. Can you guide me how to get the data or scrape this website. I have tried to find examples related to this but not getting correct one. Thanks in advance.

2 Answers 2

1

The code below will get you the data you are looking for.

Use the field recordCount in order to set the range you need to loop on.

How it works

The website is using and API call in order to get the data as JSON. It uses paging technique - it passes the page index and the page size to the server so the server knows what is the page offset and it knows which data to return. The code below simulate this activity - the loop increments the page index and this way we iterate over the data.

import requests
import time

headers = {
    "accept": "application/json, text/javascript, */*; q=0.01",
    "accept-language": "en-US,en;q=0.9,el;q=0.8,he;q=0.7,de;q=0.6,fr;q=0.5,it;q=0.4,es;q=0.3",
    "cache-control": "no-cache",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "pragma": "no-cache",
    "sec-ch-ua": "\"Google Chrome\";v=\"93\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"93\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"macOS\"",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-origin",
    "x-requested-with": "XMLHttpRequest"
}
body = {'professionalType': 'General Contractor',
        'Name': '',
        'sortName': 'OverallScore',
        'sortDirection': 'desc',
        'pageIndex': 0,
        'pageSize': 10}
url = 'https://govservices.dcra.dc.gov/contractorratingsystem/BuildingProfessionals/LoadProfessionalSearchResultsWithFilters'
for i in range(1, 3):  # TODO use actual range based on 'recordCount' (in the response) and 'pageSize'
    body['pageIndex'] = i
    r = requests.post(url, headers=headers, data=body)
    if r.status_code == 200:
        print(f'{i} --> {r.json()}')
    else:
        print(f'status code is {r.status_code}')
    time.sleep(1)

output

1 --> {'buildingProfessionals': [{'buildingProfessional': 'REVOLUTION SOLAR LLC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000062', 'projectCount': 822, 'planReviewScore': 96.1732900783996, 'applicationIntakeScore': 96.2433090024331, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 0, 'planReviewScoreRating': 4.80866450391998, 'applicationIntakeScoreRating': 4.812165450121655, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '10746 JUDY LANE COLUMBIA MD 21044', 'businessPhone': '4438655039', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'AMERICAN AUTOMATIC SPRINKLER CO', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410514000016', 'projectCount': 471, 'planReviewScore': 0.29603315571344, 'applicationIntakeScore': 0.245115452930728, 'inspectionScore': 99.7122042886194, 'overAllProjectScore': 100, 'stopWorkOrders': 12, 'planReviewScoreRating': 0.014801657785672, 'applicationIntakeScoreRating': 0.0122557726465364, 'inspectionScoreRating': 4.98561021443097, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '3149 DRAPER DRIVE FAIRFAX VA 22031', 'businessPhone': '7038498180', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'FIRE & LIFE SAFETY AMERICA INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000410', 'projectCount': 348, 'planReviewScore': 0.218818380743982, 'applicationIntakeScore': 0.218818380743982, 'inspectionScore': 99.781181619256, 'overAllProjectScore': 100, 'stopWorkOrders': 2, 'planReviewScoreRating': 0.0109409190371991, 'applicationIntakeScoreRating': 0.0109409190371991, 'inspectionScoreRating': 4.9890590809628, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '3017 VERNON ROAD RICHMOND VA 23228', 'businessPhone': '8042221381', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'NORTHERN FIRE PROTECTION, INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000183', 'projectCount': 250, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 98.7889273356401, 'overAllProjectScore': 100, 'stopWorkOrders': 2, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.939446366782005, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '21530 BLACKWOOD COURT SUITE #150 STERLING VA 20166', 'businessPhone': '7034069811', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'PHOENIX FIRE PROTECTION INC.', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000155', 'projectCount': 174, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 4, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '7901 PENN RANDALL PLACE UPPER MARLBORO MD 20772', 'businessPhone': '3016697066', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'JENSON FIRE PROTECTION INC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410517000309', 'projectCount': 146, 'planReviewScore': 0.632911392405063, 'applicationIntakeScore': 0.632911392405063, 'inspectionScore': 100, 'overAllProjectScore': 100, 'stopWorkOrders': 0, 'planReviewScoreRating': 0.03164556962025315, 'applicationIntakeScoreRating': 0.03164556962025315, 'inspectionScoreRating': 5, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '8740 CHERRY LANE UNIT 13 LAUREL MD 20707', 'businessPhone': '', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'LIVINGSTON FIRE PROTECTION INC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410516000203', 'projectCount': 145, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 98.3734939759036, 'overAllProjectScore': 100, 'stopWorkOrders': 5, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.91867469879518, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '5150 LAWRENCE PLACE HYATTSVILLE MD 20781', 'businessPhone': '3017794466', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'RIDGEWAY CORPORATION PROFESSIONAL CORPORATION', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000087', 'projectCount': 145, 'planReviewScore': 0.798403193612774, 'applicationIntakeScore': 1.4251497005988, 'inspectionScore': 95.688622754491, 'overAllProjectScore': 100, 'stopWorkOrders': 4, 'planReviewScoreRating': 0.0399201596806387, 'applicationIntakeScoreRating': 0.07125748502994, 'inspectionScoreRating': 4.78443113772455, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '12514 KENSINGTON LANE BOWIE MD 20715', 'businessPhone': '3014642003', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'FORTRESS PROTECTION GROUP', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410518000115', 'projectCount': 124, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 99.4932432432432, 'overAllProjectScore': 100, 'stopWorkOrders': 5, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.97466216216216, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '18618 BROKEN OAK RD BOYDS MD 20841', 'businessPhone': '', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}, {'buildingProfessional': 'PRIME FIRE PROTECTION LLC', 'buildingProfessionalType': 'General-Contractor', 'permitType': None, 'businessName': None, 'contactNumber': '410517000488', 'projectCount': 120, 'planReviewScore': 0, 'applicationIntakeScore': 0, 'inspectionScore': 94.4320987654321, 'overAllProjectScore': 100, 'stopWorkOrders': 20, 'planReviewScoreRating': 0, 'applicationIntakeScoreRating': 0, 'inspectionScoreRating': 4.721604938271605, 'overAllProjectScoreRating': 5, 'useCategory': None, 'businessEmail': '[email protected]', 'imageName': 'noimage.png', 'imageUrl': 'https://govservices.dcra.dc.gov/ProfessionalImages/noimage.png', 'businessAddress': '13549 JAMIESON PL GERMANTOWN MD 20874', 'businessPhone': '3104736189', 'flag': '', 'professionalDisplayName': 'General Contractor', 'webAddress': 'N/A', 'bbb': 'NOT ACCREDITED'}], 'pageIndex': 1, 'pageSize': 10, 'recordCount': 1113}
...
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks @balderman can I get the explanation for the whole code so that I can know how and why the code works.
Sure - I will add that to the answer. Did you test it?
yes sir I have tested it and its working.
sir one more thing can you give me some more examples or websites for me to refer so that I can learn this thing
You need to learn about Web Scraping. Please accept the answer.
0

You could use requests to hit the url which generates the response and then use Beautiful soup to parse it maybe?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.