2

this is my code that checks multiple urls for a specific keyword and writes to the output file if the keyword was found or not.

import requests
import pandas as pd
from bs4 import BeautifulSoup

df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []

for url in urls:
    url_1 = url
    keyword ='myKeyword'
    res = requests.get(url_1)
    finalresult= print(keyword in res.text)

    if finalresult == False:
        myList.append("NOT OK")
    else:
        myList.append("OK")

df["myList"] = pd.DataFrame(myList, columns=['myList'])

df.to_csv('/path/to/output.csv', index=False)

However, once any of my multiple URLs is down and there is an HTTP error the script stops and the following error is displayed:

    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

How can I ignore such errors and let my script continue with the scan? Could someone help me with this? thx

2 Answers 2

3

Try to put try..except only around requests.get() and res.text.

For example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []

for url in urls:
    url_1 = url
    keyword ='myKeyword'
    try:                                    # <-- put try..except here
        res = requests.get(url_1)
        finalresult = keyword in res.text   # <-- remove print()
    except:
        finalresult = False

    if finalresult == False:
        myList.append("NOT OK")
    else:
        myList.append("OK")

df["myList"] = pd.DataFrame(myList, columns=['myList'])

df.to_csv('/path/to/output.csv', index=False)

EDIT: To put Down into the list when there's error:

for url in urls:
    url_1 = url
    keyword ='myKeyword'
    try:                                    # <-- put try..except here
        res = requests.get(url_1)

        if keyword in res.text:
            myList.append("OK")
        else:
            myList.append("NOT OK")
    except:
        myList.append("Down")
Sign up to request clarification or add additional context in comments.

1 Comment

This works! thank you Andrej. Just wondering, how I could add a flag for HTTP errors. so for example, if this is my URL in the input file 'argos-yoga.com' , I would like to have it marked as 'Down' instead of 'OK ' as the page is not working. Can I add something like this somewhere in your code: except Exception as e: print(f"There was an error, error = {e}") myList("Down") pass The reason I'm asking is because it would be good to know which URLs throw HTTP errors too and have them saved in my errorLog.txt when I will run this script from terminal. Thank you!
0

Simply you can use the try-except way

Example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []

for url in urls:
    url_1 = url
    keyword ='myKeyword'
    try:
        res = requests.get(url_1)
        finalresult = keyword in res.text
        print(finalresult)
        if finalresult == False:
            myList.append("NOT OK")
        else:
            myList.append("OK")
    except Exception as e:
        print(f"There was an error, error = {e}")
        pass
    df["myList"] = pd.DataFrame(myList, columns=['myList'])
    df.to_csv('/path/to/output.csv', index=False)

3 Comments

Thanks Ahmed! I've tried the above code, however it doesn't add 'NOT OK' if finalresult ==False. I get 'OK' for all URLs. would you know how Can I fix this?
I am not sure, so when final result is equal to False it says OK?
Ok yeah, I figured it out you need to add finalresult = keyword is res.text since you just assigned it to a print statement, I edited the code try it and tell me if it worked

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.