1

I am a few days into python and pandas and I am running into a situation that I can not seem to resolve on my own. I have a for loop to fetch status codes and print out the results if they meet certain criteria. The for loop I have is as followed:

For loop:

import requests
from requests.exceptions import HTTPError

response_full_result = []

for url in url_list:
    try:
       response = requests.get(url)
       response_full_result.append(response)

       # If the response was successful, no Exception will be raised
       response.raise_for_status()

    except HTTPError as http_err:
       failed_result.append(http_err)
       print(f'HTTP error occurred: {http_err}')  
    except Exception as err:
       print(f'Other error occurred: {err}')  
    else:
       print('Success!')

What this for loop does is, it iterates through a column on a .csv and executes a get call to fetch status codes. Now, what this also does is print out exceptions as specified in the order they are executed. For example, if first three rows of column are: 200,400,NaN -, result will be: success, HTTP error, and Other Error (respectively)

Desired Result: I understand by design it is printing as expected - what I would like is for all the outputs to be stored in a variable / List that I can work with later. i.e. success, HTTP error, Other Error - Is there a way to do this? I tried append method, and pickle means I will have to convert to dictionary which is not ideal. Is there a way to do this in Python or Pandas?

Also, thanks to this doc for providing for loop - it is not mine. I am using PyCharm on Python 3.9. I am new to this and just started last week, I have found a lot of things but was unable to find an answer that helped me in my particular situation. Maybe I missed this - apologies.

Thank you to anyone who can help and give suggestions !

1 Answer 1

1

You can create a new list, for example status_codes and append the status to it after each iteration. Then you can use zip() to tie URL and status codes together or create new dataframe. For example:

import requests
from requests.exceptions import HTTPError

response_full_result = []

url_list = [
    "https://www.google.com",
    "https://www.yahoo.com",
    "https://xxx.domain.example",
]

status_codes = []  # <-- here we store status codes

for url in url_list:
    try:
        response = requests.get(url)
        response_full_result.append(response)

        # If the response was successful, no Exception will be raised
        response.raise_for_status()
    except HTTPError as http_err:
        failed_result.append(http_err)
        print(f"HTTP error occurred: {http_err}")
        status_codes.append("HTTP error")
    except Exception as err:
        print(f"Other error occurred: {err}")
        status_codes.append("Other error")
    else:
        print("Success!")
        status_codes.append("success")

print()

# print the results - use `zip()`
for url, status in zip(url_list, status_codes):
    print("{:<30} {}".format(url, status))

print()

# create a dataframe and write it to csv:
df = pd.DataFrame({"URL": url_list, "Status": status_codes})
print(df)
df.to_csv("data.csv", index=False)

Prints:

Success!
Success!
Other error occurred: HTTPSConnectionPool(host='xxx.domain.example', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1e1c5e4100>: Failed to establish a new connection: [Errno -2] Name or service not known'))

https://www.google.com         success
https://www.yahoo.com          success
https://xxx.domain.example     Other error

                          URL       Status
0      https://www.google.com      success
1       https://www.yahoo.com      success
2  https://xxx.domain.example  Other error

And creates data.csv:

URL,Status
https://www.google.com,success
https://www.yahoo.com,success
https://xxx.domain.example,Other error

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

thank you so much! This worked and you anticipated where I was going to go on my next steps. I notice something I have not seen before, {:30} in code block for zip - is this a zip feature or does it mean something in python for object? I will look into zip more in depth - Thank you again for the assistance and guidance. Truly appreciate it.
@knowledgealways {:<30} is just string formatting, meaning to align the domain name to left (30 characters wide) - it's just for pretty print
Ah, I suspected and now you confirmed it, appreciate the pretty print in mind. And now after studying your edits, I see you appended each print output to same variable: status_codes, then use zip and after formatting created new* dataframe to work with... - , Thank you very much for help in accomplishing my task (:

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.