I have successfully managed to scrape the website listed from JS into a local .html file, but the output falls short.
The issues are:
- it only produces the last query (audioSource) and not the other requests
- it finds only episode 1, and stops there. How do I make it repeat until it finds the end?
Many thanks
import requests
import json
from bs4 import BeautifulSoup
JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()
for line in JSONDATA['posts']:
soup = BeautifulSoup(line['episodeNumber'],'lxml')
soup = BeautifulSoup(line['title'],'lxml')
soup = BeautifulSoup(line['image']['large'],'lxml')
soup = BeautifulSoup(line['excerpt']['long'],'lxml')
soup = BeautifulSoup(line['audioSource'],'lxml')
with open("output1.html", "w") as file:
file.write(str(soup))
