Continuing Python Script based off error

Question

I'm on Ubuntu 14.04 using python 2.7 scraping with rotating proxies... After a few minutes of scraping the error:

raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))


            if keyword1 in text and keyword2 in text and keyword3 in text:
                print("LINK SCRAPED")
                print(text, "link scraped")
                found = True 
                break 

except requests.exceptions.ConnectionError as err:
    print("Encountered ConnectionError, retrying: {}".format(err))

If this is not the correct way to implement try I assume only the request goes into the try clause and everything else is after except ?

I will remove beautifulsoup tag.

宏杰李
– 宏杰李

2017-01-06 02:46:38 +00:00
Commented Jan 6, 2017 at 2:46 — 宏杰李
– 宏杰李, Commented Jan 6, 2017 at 2:46

rdegges · Accepted Answer · 2017-01-06 21:44:27Z

2

Instead of restarting the script, you can handle the error using a try / except statement.

For example:

try:
    # line of code that is failing
except requests.exceptions.ConnectionError as err:
    print("Encountered ConnectionError, retrying: {}".format(err))

Then just retry the original call.

UPDATE: Based on your updated code sample, here's what I'd do:

from bs4 import BeautifulSoup
import requests
import smtplib
import urllib2
from random import randint
import time
from lxml import etree
from time import sleep
import random


proxies = {'https': '100.00.00.000:00000'}
hdr1 = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'none',
    'Accept-Language': 'en-US,en;q=0.8',
    'Connection': 'keep-alive',
}

hdrs = [hdr1] #, hdr2, hdr3, hdr4, hdr5, hdr6, hdr7]
ua = random.choice(hdrs)
head = {
    'Connection': 'close',
    'User-Agent': ua,
}

#####   REQUEST  1  ####
done = False
while not done:
    try:
        a = requests.get('https://store.fabspy.com/sitemap.xml', proxies=proxies, headers=head)
        done = True
    except requests.exceptions.ConnectionError as err:
        print('Encountered ConnectionError, retrying: {}'.format(err))
        time.sleep(1)

scrape = BeautifulSoup(a.text, 'lxml')
links = scrape.find_all('loc')
for link in links:
    if 'products' in link.text:
        sitemap = str(link.text)
        break

keyword1 = 'not'
keyword2 = 'on'
keyword3 = 'site'

#########    REQUEST 2 #########
done = False
while not done:
    try:
        r = requests.get(sitemap, proxies=proxies, headers=head)
        done = True
    except requests.exceptions.ConnectionError as err:
        print('Encountered ConnectionError, retrying: {}'.format(err))
        sleep(randint(4,6))

soup = BeautifulSoup(r.text, 'lxml')
links = soup.find_all('loc')
for link in links:
    text = link.text
    if keyword1 in text and keyword2 in text and keyword3 in text:
        print(text, 'link scraped')
        break

edited Jan 6, 2017 at 21:44

answered Jan 6, 2017 at 2:44

rdegges

34.1k22 gold badges88 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ColeWorld Over a year ago

I have attempted to apply this to slimmer version of the script I am running, I have edited it on above, can you verify?

ColeWorld Over a year ago

should the try statement contain the entire request loop ? Or only the initial request and the rest of the loop after the except

rdegges Over a year ago

@ColeWorld I just updated my answer to include a re-written code sample for ya.

ColeWorld Over a year ago

Thanks, so far this seems to solve the error issue but I think it conflicts with the keyword search.. If you pass any string to the keywords where 1 keyword matches with some link on the site it provides that link..

rdegges Over a year ago

Sorry, not sure what you mean by keyword search. I was only looking at handling the error correctly, I'm not terribly familiar with the other logic in your program.

Collectives™ on Stack Overflow

Continuing Python Script based off error

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related