Get full url from shorten url using python

Question

I am having list of urls like ,

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

I just want to see the full url from the short one for every element in that list.

Here is my approach,

import urllib2

for i in l:
    print urllib2.urlopen(i).url

But when list contains thousands of url , the program takes long time.

My question : Is there is any way to reduce execution time or any other approach I have to follow ?

Might be worth looking at dev.bitly.com (specifically dev.bitly.com/links.html#v3_expand which allows 15 URLs to be expanded at a time). No doubt there's some Python bitly wrappers on pypi or code.google - but I'll leave you to search for those. — Jon Clements
– Jon Clements, Commented Aug 11, 2014 at 14:24
Well, use the bitly api for the ones that are... if there's other common shortners, they'll probably have APIs that can be used as well... otherwise, you're stuck with your current approach of seeing where you end up after redirection. You may wish to consider multi-threading/processing to make multiple requests at the same time. — Jon Clements
– Jon Clements, Commented Aug 11, 2014 at 14:37

Roberto Reale · Accepted Answer · 2014-08-11 14:36:07Z

15

First method

As suggested, one way to accomplish the task would be to use the official api to bitly, which has, however, limitations (e.g., no more than 15 shortUrl's per request).

Second method

As an alternative, one could just avoid getting the contents, e.g. by using the HEAD HTTP method instead of GET. Here is just a sample code, which makes use of the excellent requests package:

import requests

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

for i in l:
    print requests.head("http://"+i).headers['location']

answered Aug 11, 2014 at 14:36

Roberto Reale

4,3271 gold badge19 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Robᵩ Over a year ago

As a bonus, requests.head() doesn't follow the redirection, so it saves at least one HTTP transaction.

Mohammad Monfared Over a year ago

Actually it doesn't return the link, see my answer

Mohammad Monfared · Accepted Answer · 2021-11-12 14:57:44Z

2

from requests import get

def get_real_url_from_shortlink(url):
    resp = requests.get(url)
    return resp.url

answered Nov 12, 2021 at 14:57

Mohammad Monfared

6856 silver badges20 bronze badges

Comments

Robᵩ · Accepted Answer · 2014-08-12 02:54:48Z

I'd try twisted's asynchronous web client. Be careful with this, though, it doesn't rate-limit at all.

#!/usr/bin/python2.7

from twisted.internet import reactor
from twisted.internet.defer import Deferred, DeferredList, DeferredLock
from twisted.internet.defer import inlineCallbacks
from twisted.web.client import Agent, HTTPConnectionPool
from twisted.web.http_headers import Headers
from pprint import pprint
from collections import defaultdict
from urlparse import urlparse
from random import randrange
import fileinput

pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = 16
agent = Agent(reactor, pool)
locks = defaultdict(DeferredLock)
locations = {}

def getLock(url, simultaneous = 1):
    return locks[urlparse(url).netloc, randrange(simultaneous)]

@inlineCallbacks
def getMapping(url):
    # Limit ourselves to 4 simultaneous connections per host
    # Tweak this as desired, but make sure that it no larger than
    # pool.maxPersistentPerHost
    lock = getLock(url,4)
    yield lock.acquire()
    try:
        resp = yield agent.request('HEAD', url)
        locations[url] = resp.headers.getRawHeaders('location',[None])[0]
    except Exception as e:
        locations[url] = str(e)
    finally:
        lock.release()


dl = DeferredList(getMapping(url.strip()) for url in fileinput.input())
dl.addCallback(lambda _: reactor.stop())

reactor.run()
pprint(locations)

TheTrickBoy · Accepted Answer · 2025-03-29 10:54:28Z

1

You can use 'pyurlextract' library to extract all links from shortened link.

pip install pyurlextract

you will get all details from here - https://pypi.org/project/pyurlextract/

from pyurlextract import extract_shorturl

short_url = "https://url.com/3Bg19uM"  # The actual short URL
full_link, all_links = extract_shorturl(short_url)

if full_link is None:
    print("Failed to expand the URL")
    print("Details:", all_links)
else:
    print("Original URL:", short_url)
    print("Full Link:", full_link)
    print("All Possible Redirections:", all_links)

answered Mar 29 at 10:54

TheTrickBoy

111 bronze badge

Collectives™ on Stack Overflow

Get full url from shorten url using python

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related