3

I need to extract all the urls from an ip list, i wrote this python script, but i have issue extracting the same ip multiple times (more threads are created with the same ip). Could anyone Improve on my solution using multithreading ?

Sorry for my english Thanks all

import urllib2, os, re, sys, os, time, httplib, thread, argparse, random

try:
    ListaIP = open(sys.argv[1], "r").readlines()
except(IOError): 
    print "Error: Check your IP list path\n"
    sys.exit(1)



def getIP():
    if len(ListaIP) != 0:
        value = random.sample(ListaIP,  1)
        ListaIP.remove(value[0])
        return value
    else:
        print "\nListaIPs sa terminat\n"
        sys.exit(1)

def extractURL(ip):
    print ip + '\n'
    page = urllib2.urlopen('http://sameip.org/ip/' + ip)
    html = page.read()
    links = re.findall(r'href=[\'"]?([^\'" >]+)', html)
    outfile = open('2.log', 'a')
    outfile.write("\n".join(links))
    outfile.close()

def start():
    while True:
        if len(ListaIP) != 0:
            test = getIP()
            IP = ''.join(test).replace('\n', '')
            extractURL(IP)
        else:
            break


for x in range(0, 10):
    thread.start_new_thread( start, () )

while 1:
    pass
1
  • It works ok to import os once; no need to import it twice. Commented Dec 17, 2012 at 17:04

1 Answer 1

5

use a threading.Lock. The lock should be global, and create at the beginning when you create the IP list.

lock.acquire at the start of getIP()

and release it before you leave the method.

What you are seeing is, thread 1 executes value=random.sample, and then thread 2 also executes value=random.sample before thread 1 gets to the remove. So the item is still in the list at the time thread 2 gets there. Therefore both threads have a chance of getting the same IP.

Sign up to request clarification or add additional context in comments.

2 Comments

cooler method to use locks: with lock: statements (using context managers )
def getIP(): lock.acquire() if len(ListaIP) != 0: value = random.sample(ListaIP, 1) ListaIP.remove(value[0]) return value else: print "\nListaIPs sa terminat\n" sys.exit(1) lock.release() global name 'lock' is not defined Not working ... I give me this error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.