1

I need to scrape some data from this graph but in tabular format. Link

The problem is the structure of this graph, because it has months in the middle of the years, and I have tried some online scrapers but they consume too much time and sometimes I get distorted data.

More in detail I am using this software which I am citing because may help other people like me app

What do you suggest me to scrape and get the best results because I need to scrape a lot of these kind of graphs :(

2 Answers 2

1

The data for graph is embedded inside <script> tag, so to get them you can use next example:

import json
import re

import pandas as pd
import requests


url = "https://www.instat.gov.al/en/sdgs/no-poverty/12-by-2030-reduce-at-least-by-half-the-proportion-of-men-women-and-children-of-all-ages-living-in-poverty-in-all-its-dimensions-according-to-national-definitions/121-proportion-of-population-living-below-the-national-poverty-line-by-sex-and-age/"

html_text = requests.get(url).text

# for map data:
# map_data = re.search(r"mapData=(.*?);<", html_text).group(1)
# print(map_data)

graph_data = re.search(r"graphsDataJson=(.*?);<", html_text).group(1)
graph_data = json.loads(graph_data)

df = pd.DataFrame(graph_data[0]["indicatorDataValues"])
print(df)

Prints:

   year  value
0  2017   23.7
1  2018   23.4
2  2019   23.0
3  2020   21.8
Sign up to request clarification or add additional context in comments.

3 Comments

This code functioned pretty well on Linux, but in MacOs I get a repeated error saying: ModuleNotFoundError: No module named 'pandas'
Idk what to do as I have installed pandas on terminal using pip3 install pandas
@Anisa Make sure you install pandas in the same environment as you run this script.
1

I dont speak albanian(?), but this website transfers the data not in a json file (which is more common), but in a javascript file.

Have a look at the network explorer of your broswer and you will find the two urls that contain the data:

https://www.instat.gov.al/scripts/alb-geojson/gadm36_ALB_2.js

and

https://www.instat.gov.al/scripts/alb-geojson/gadm36_ALB_3.js

Simply copy it in your browser and you find nice structured data (which I dont understand myself without context)

edit There is a download button for xlsx file: https://www.instat.gov.al/media/5789/goal-1_indicator_1_2_1-web.xlsx

enter image description here

5 Comments

You saw the excel download button at the bottom? instat.gov.al/media/5789/goal-1_indicator_1_2_1-web.xlsx This gives you all available data in the graph as xlsx file, thats exactly what you need? The data is a little hidden in the last columns
I saw it but it is not for every SDG goal, just for 6 of them. take a look here link
also the years do not correspond with their data sheet, because the graph has recent data not from 2012
I dont find a way, sorry. Since there are only 17 SDGs... why not extract the data manually?
Of course this was my plan B, but I was curios to know if it would be a way to automate the work. Anyway thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.