2

I encounter this error when I'm trying to download a lot of pages from a website. The script is pieced up and modified from several other scripts and it seems that I am rather unfamiliar with Python and programming.

The version of Python is 3.4.3 and the version of Requests is 2.7.0.

This is the script:

import requests
from bs4 import BeautifulSoup
import os.path

s = requests.session()
login_data = {'dest': '/','user': '******', 'pass': '******'}
header_info={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0'}
url='http://www.oxfordreference.com/LOGIN'
s.post(url,data=login_data,headers=header_info)

for i in range(1,100):
    downprefix='http://www.oxfordreference.com/view/10.1093/acref/9780198294818.001.0001/acref-9780198294818-e-'
    downurl=downprefix+str(i)
    r=s.get(downurl,headers=header_info,timeout=30)
    if r.status_code==200:
        soup=BeautifulSoup(r.content,"html.parser")
        shorten=str(soup.find_all("div", class_="entryContent"))
        fname='acref-9780198294818-e-'+str(i)+'.htm'
        newname=os.path.join('shorten',fname)
        htmfile=open(newname,'w',encoding="utf_8")
        htmfile.write(shorten)
        htmfile.close()
        print('Success in '+str(i))
else:
        print('Error in '+str(i))
        errorfile=open('errors.txt','a',encoding="utf_8")
        errorfile.write(str(i))
        errorfile.write('\n')
        errorfile.close()

The complete trackback is:

Traceback (most recent call last):
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 372, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 374, in _make_request
    httplib_response = conn.getresponse()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 1171, in getresponse
    response.begin()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\adapters.py", line 370, in send
    timeout=timeout
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 597, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\util\retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\packages\six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 374, in _make_request
    httplib_response = conn.getresponse()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 1171, in getresponse
    response.begin()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "D:\Program Files (x86)\python343\lib\http\client.py", line 321, in _read_status
    raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine("''",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\stuff\Mdict\dict by me\odoa\newahktest\CrawlTest2.py", line 14, in <module>
    r=s.get(downurl,headers=header_info,timeout=30) 
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\sessions.py", line 477, in get
    return self.request('GET', url, **kwargs)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "D:\Program Files (x86)\python343\lib\site-packages\requests\adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))

1 Answer 1

1

The host you're talking to did not respond properly. This usually happens when you try to connect to an https service using http, but there may be a lot of other situations too.

Probably the best way to check what's going on is to get a network traffic analyser (for example wireshark) and look the connection.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your suggestion. I look thoroughly into the request header in firefox's developer tool and find the cookie in the browser is different from the cookie I get after "r=s.get(downurl,headers=header_info,timeout=30)". Is there anything wrong?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.