-1

i want to fetch html file from some location and convert it to Json format using python.

for below code i m getting output just a text.

from bs4 import BeautifulSoup
import json
html = '<p>Hello</p><p>world</p>'
soup = BeautifulSoup(html, 'html.parser')
things = soup.find_all(text=True)
print(things)

1 Answer 1

-1
 jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON 
 string representation. jsonL = json.loads(jsonD) parses the JSON string back into a 
 regular string/unicode object. This results in a no-op, as any escaping done by 
 dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

 Try to use json.dumps to generate your final JSON instead of building the JSON by 
 hand:

 ContentUrl = json.dumps({
'url': str(urls),
'uid': str(uniqueID),
'page_content': htmlContent.text,
'date': finalDate
})
Sign up to request clarification or add additional context in comments.

3 Comments

this answer seems to be directly lifted from here : stackoverflow.com/questions/43469412/…
Yes coz of the issue is same and working so what's wrong in that @mohan111
most of the users avoid the open link so it's better you can suggest directly that's why I did it @mohan111

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.