how to convert html file to json using python

Question

i want to fetch html file from some location and convert it to Json format using python.

for below code i m getting output just a text.

from bs4 import BeautifulSoup
import json
html = '<p>Hello</p><p>world</p>'
soup = BeautifulSoup(html, 'html.parser')
things = soup.find_all(text=True)
print(things)

Yash Shukla · Accepted Answer · 2019-07-25 09:13:59Z

-1

 jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON 
 string representation. jsonL = json.loads(jsonD) parses the JSON string back into a 
 regular string/unicode object. This results in a no-op, as any escaping done by 
 dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

 Try to use json.dumps to generate your final JSON instead of building the JSON by 
 hand:

 ContentUrl = json.dumps({
'url': str(urls),
'uid': str(uniqueID),
'page_content': htmlContent.text,
'date': finalDate
})

answered Jul 25, 2019 at 9:13

Yash Shukla

1416 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mohan111 Over a year ago

this answer seems to be directly lifted from here : stackoverflow.com/questions/43469412/…

Yash Shukla Over a year ago

Yes coz of the issue is same and working so what's wrong in that @mohan111

Yash Shukla Over a year ago

most of the users avoid the open link so it's better you can suggest directly that's why I did it @mohan111

Collectives™ on Stack Overflow

how to convert html file to json using python

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related