10

I am making a website. I want to check from the server whether the link that the user submitted is actually an image that exists.

4 Answers 4

20

This is the best approach working for my application, based also on previous comments:

import requests

def is_url_image(image_url):
   image_formats = ("image/png", "image/jpeg", "image/jpg")
   r = requests.head(image_url)
   if r.headers["content-type"] in image_formats:
      return True
   return False
Sign up to request clarification or add additional context in comments.

2 Comments

Guessing the MIME type via the request headers is a much better way of doing it, but I would beware of using a HEAD request, I've heard that some websites do not function correctly whereas a GET request may serve better. Although, the case I am referencing to had to do with the content-size header, not the content-type header, so who knows.
For two different URLs, one an image and one a non-image, r.headers["content-type"] = "text/html; charset=iso-8859-1". i.e. This function returns False regardless. Probing deeper, the reason seems to be that my "image" URL actually redirects to a new URL where the image exists, which is seamless in the browser and when downloading, but the header only comes back as an image if you manually trace the redirects to find the "final" URL where the image "really" lives. Using that URL, the routine returns True. So...use this routine with caution: it returns False more than one may find necessary.
14

This is one way that is quick:

It doesn't really verify that is really an image file, it just guesses based on file extention and then checks that the url exists. If you really need to verify that the data returned from the url is actually an image (for security reasons) then this solution would not work.

import mimetypes, urllib2

def is_url_image(url):    
    mimetype,encoding = mimetypes.guess_type(url)
    return (mimetype and mimetype.startswith('image'))

def check_url(url):
    """Returns True if the url returns a response code between 200-300,
       otherwise return False.
    """
    try:
        headers = {
            "Range": "bytes=0-10",
            "User-Agent": "MyTestAgent",
            "Accept": "*/*"
        }

        req = urllib2.Request(url, headers=headers)
        response = urllib2.urlopen(req)
        return response.code in range(200, 209)
    except Exception:
        return False

def is_image_and_ready(url):
    return is_url_image(url) and check_url(url)

5 Comments

a HEAD request could probably do, too.
I have found more sites/servers support the Range header than will respond to a HEAD request, even though thats what a head request is for.
Curious. Is range 0-10 arbitrary? Could you, for example, request 0-0? Seems to be valid to do so: w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1
I think the import statement should say 'mimetypes'.
In case of parameters in the given URL: import mimetypes def is_url_image(url): mimetype,encoding = mimetypes.guess_type(url.split("?")[0]) return (mimetype and mimetype.startswith('image'))
4

You can read the header of the http request, it contains some meta-data like the content-type.

On python 3:

from urllib.request import urlopen
image_formats = ("image/png", "image/jpeg", "image/gif")
url = "http://localhost/img.png"
site = urlopen(url)
meta = site.info()  # get header of the http request
if meta["content-type"] in image_formats:  # check if the content-type is a image
    print("it is an image")

You can also get other info like the size of the image and etc. The good news about this is that it doesn't download the image. It could fail if the header says that it is an image and it is not, but you can still do a last check and download the image if it pass the first filter.

Comments

1

Take a look into imghdr

Here is some example code:

import imghdr
import httplib
import cStringIO

conn = httplib.HTTPConnection('www.ovguide.com', timeout=60)
path = '/img/global/ovg_logo.png'
conn.request('GET', path)
r1 = conn.getresponse()

image_file_obj = cStringIO.StringIO(r1.read())
what_type = imghdr.what(image_file_obj)

print what_type

This should return 'png'. If it is not an image it will return None

Hope that helps!

-Blake

1 Comment

If you absolutely want to be sure its an image, this is the way to go, but it comes at a cost of retrieving the whole image file first

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.