Linked Questions

-1 votes
2 answers
3k views

Possible Duplicate: Extracting text from HTML file using Python What is the best way in Python to extract text from HTML pages in the same way that browser does when you copy-paste?
Mark Vital's user avatar
-2 votes
4 answers
4k views

I want to extract the (paragraph) within the html tags in Python <p style="text-align: justify;"><span style="font-size: small; font-family: lato, ...
s.s's user avatar
  • 93
-3 votes
2 answers
849 views

I have the html file in the format of following. I want to parse it using python. However, I am ignorant of using the xml module. your suggestions are highly welcome. Note: sorry for my ignorant ...
Frank Wang's user avatar
  • 1,610
-3 votes
2 answers
1k views

Possible Duplicate: Extracting text from HTML file using Python Parsing Source Code (Python) Approach: Beautiful Soup, lxml, html5lib difference? Currently have a large webpage whose source code ...
zhuyxn's user avatar
  • 7,121
1 vote
1 answer
978 views

I've tried the following: import urllib link = 'https://automatetheboringstuff.com/chapter7/' f = urllib.request.urlopen(link) myfile = f.read() print(myfile) But that just seems to return the page'...
Fashinated's user avatar
-1 votes
1 answer
143 views

I am trying to scrape a web page containing the names of companies. The names are between tags. The format is: <option value="15589" id="optExhibitor15589" title="N571 Company One, Inc">N571 ...
Chuck Kile's user avatar
1 vote
1 answer
101 views

I need to scrape only the textual content under the Reference in h3 at this URL, i'm trying with this code but i'm not able to get the text in the same order showed in the html page. i=43 ...
Poggio's user avatar
  • 131
0 votes
0 answers
51 views

How can I extract relevant information from the gobble of words I got from these gobble of html: ><br>Inspect, diagnose, maintain, and operate test setups and equipment to detect ...
JPdL's user avatar
  • 149
77 votes
10 answers
76k views

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type? For ...
Cristian's user avatar
  • 44.2k
64 votes
7 answers
69k views

I need to convert markdown text to plain text format to display summary in my website. I want the code in python.
Krish's user avatar
  • 1,183
49 votes
7 answers
154k views

I'm looking for a way to search a whole subversion server. I already got a piece of the puzzle to search within a repository. Now I need to do this for every repository. Update: I have to access ...
lamcro's user avatar
  • 6,301
29 votes
7 answers
62k views

I am trying to scroll to the end of a page so that I can make all the data visible and extract it. I tried to find a command for it but it's available in java (driver.executeScript) but couldn't find ...
Prabhjot Singh Rai's user avatar
18 votes
8 answers
28k views

Because regular expressions scare me, I'm trying to find a way to remove all HTML tags and resolve HTML entities from a string in Python.
akraut's user avatar
  • 536
3 votes
6 answers
43k views

I am trying to access the article content from a website, using beautifulsoup with the below code: site= 'www.example.com' page = urllib2.urlopen(req) soup = BeautifulSoup(page) content = soup....
Mustard Tiger's user avatar
16 votes
5 answers
12k views

Is there a simple way to lauch the systems default editor from a Python command-line tool, like the webbrowser module?
pkit's user avatar
  • 8,369

15 30 50 per page