readlines() error with for-loop in python

Question

This error is hard to describe because I can't figure out how the loop is even affecting the readline() and readlines() Methods. When I try using the former, I get these unexpected Traceback errors. When I try the latter, my code runs and nothing happens. I have determined that the bug is located in the first eight lines. The first few lines of the Topics.txt file is posted.

Code

import requests
from html.parser import HTMLParser
from bs4 import BeautifulSoup

Url = "https://ritetag.com/best-hashtags-for/"
Topicfilename = "Topics.txt"
Topicfile = open(Topicfilename, 'r')
Line = Topicfile.readlines()
Linenumber = 0
for Line in Topicfile:
    Linenumber += 1
    print("Reading line", Linenumber)

    Topic = Line
    Newtopic = Topic.strip("\n").replace(' ', '').replace(',', '')
    print(Newtopic)
    Link = Url.join(Newtopic)
    print(Link)
    Sourcecode = requests.get(Link)

When I run this bit here, it prints the the URL preceded by the first character of the line.For example, it prints as 2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ etc. for 24 Hour Fitness.

Topics.txt

21st Century Fox
24 Hour Fitness
2K Games
3M

Full Error

Reading line 1 24HourFitness 2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ohttps://ritetag.com/best-hashtags-for/uhttps://ritetag.com/best-hashtags-for/rhttps://ritetag.com/best-hashtags-for/Fhttps://ritetag.com/best-hashtags-for/ihttps://ritetag.com/best-hashtags-for/thttps://ritetag.com/best-hashtags-for/nhttps://ritetag.com/best-hashtags-for/ehttps://ritetag.com/best-hashtags-for/shttps://ritetag.com/best-hashtags-for/s

Traceback (most recent call last): File "C:\Users\Caden\Desktop\Programs\LususStudios\AutoDealBot\HashtagScanner.py", line 17, in Sourcecode = requests.get(Link) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\api.py", line 71, in get return request('get', url, params=params, **kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\api.py", line 57, in request return session.request(method=method, url=url, **kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py", line 475, in request resp = self.send(prep, **send_kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py", line 579, in send adapter = self.get_adapter(url=request.url) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py", line 653, in get_adapter raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were found for '2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ohttps://ritetag.com/best-hashtags-for/uhttps://ritetag.com/best-hashtags-for/rhttps://ritetag.com/best-hashtags-for/Fhttps://ritetag.com/best-hashtags-for/ihttps://ritetag.com/best-hashtags-for/thttps://ritetag.com/best-hashtags-for/nhttps://ritetag.com/best-hashtags-for/ehttps://ritetag.com/best-hashtags-for/shttps://ritetag.com/best-hashtags-for/s'

The file is read in one gulp with Line = Topicfile.readlines(). Just eliminate that line. — dawg
– dawg, Commented Aug 6, 2016 at 19:43
Under the hood, the readlines method "consumes" the file, so when it returns the underlying file position pointer is at the end of the file. Then you try to read the file some more in the for loop, but since it's already at the end it does nothing. Use only one of the two methods. — Keith
– Keith, Commented Aug 6, 2016 at 19:47

Karin · Accepted Answer · 2016-08-06 19:29:29Z

1

I think there are two issues:

You seem to be iterating over Topicfile instead of Topicfile.readLines().
Url.join(Newtopic) isn't returning what you think it is. .join takes a list (in this case, a string is a list of characters) and will insert Url in between each one.

Here is code with these problems addressed:

import requests

Url = "https://ritetag.com/best-hashtags-for/"
Topicfilename = "topics.txt"
Topicfile = open(Topicfilename, 'r')
Lines = Topicfile.readlines()
Linenumber = 0
for Line in Lines:
    Linenumber += 1
    print("Reading line", Linenumber)

    Topic = Line
    Newtopic = Topic.strip("\n").replace(' ', '').replace(',', '')
    print(Newtopic)
    Link = '{}{}'.format(Url, Newtopic)
    print(Link)
    Sourcecode = requests.get(Link)

As an aside, I also recommend using lowercased variable names since camel case is generally reserved for class names in Python :)

answered Aug 6, 2016 at 19:29

Karin

8,63027 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Capattax Over a year ago

Yes, thank you! I didn't even think about the adjoining of the two variables affecting the for-loop. I have revised my code to Link = Url + Newtopic

OneCricketeer · Accepted Answer · 2016-08-06 19:36:37Z

Firstly, python conventions are to lowercase all variable names.

Secondly, you are exhausting the file pointer when you read all the lines at first, then continue to loop over the file.

Try to simply open the file, then loop over it

linenumber = 0
with open("Topics.txt") as topicfile:
    for line in topicfile:
        # do work 
        linenumber += 1

Then, the issue in the traceback, if you look closely, you are building up this really long url string and that's definitely not a url, so requests throws an error

InvalidSchema: No connection adapters were found for '2https://ritetag.com/best-hashtags-for/4https://ritetag.com/...

And you can debug to see that Url.join(Newtopic) is "interleaving" the Url String between each character of the Newtopic list, which is what str.join will do

Collectives™ on Stack Overflow

readlines() error with for-loop in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related