I need to make 1000 post request to same domain, I was wondering if
there is a better and more faster way to do this ?
It depends, if it's a static asset or a servlet which you know what it does, if the same parameters will return the same reponse each time you can implement LRU or some other caching mechanism, if not, 1K of POST requests to some servlet doesn't matter even if they have the same domain.
There is an answer with using multiprocessing whith ThreadPool interface, which actually uses the main process with 15 threads, does it runs on 15 cores machine ? because a core can only run one thread each time (except hyper ones, does it run on 8 hyper-cores?)
ThreadPool interface inside library which has a trivial name, multiprocessing, because python has also threading module, this is confusing as f#ck, lets benchmark some lower level code:
import psutil
from multiprocessing.pool import ThreadPool
from time import sleep
def get_url_data(param):
print(param) # just for convenience
sleep(1) # lets assume it will take one second each time
if __name__ == '__main__':
paramList = [i for i in range(100)] # 100 urls
pool = ThreadPool(psutil.cpu_count()) # each core can run one thread (hyper.. not now)
pool.map(get_url_data, paramList) # splitting the jobs
pool.close()
The code above will use the main process with 4 threads in my case because my laptop has 4 CPUs, benchmark result:
$ time python3_5_2 test.py
real 0m28.127s
user 0m0.080s
sys 0m0.012s
Lets try spawning processes w/ multiprocessing
import psutil
import multiprocessing
from time import sleep
import numpy
def get_url_data(urls):
for url in urls:
print(url)
sleep(1) # lets assume it will take one second each time
if __name__ == "__main__":
jobs = []
# Split URLs into chunks as number of CPUs
chunks = numpy.array_split(range(100), psutil.cpu_count())
# Pass each chunk into process
for url_chunk in chunks:
jobs.append(multiprocessing.Process(target=get_url_data, args=(url_chunk, )))
# Start the processes
for j in jobs:
j.start()
# Ensure all of the processes have finished
for j in jobs:
j.join()
Benchmark result: less 3 seconds
$ time python3_5_2 test2.py
real 0m25.208s
user 0m0.260s
sys 0m0.216
If you will execute ps -aux | grep "test.py" you will see 5 processes because one is the main which manage the others.
There are some drawbacks:
You did not explain in depth what your code is doing, but if you doing some work which needs to be synchronized you need to know multiprocessing is NOT thread safe.
Spawning extra processes introduces I/O overhead as data is having to be shuffled around between processors.
Assuming the data is restricted to each process, it is possible to gain significant speedup, be aware of Amdahl's Law.
If you will reveal what your code does afterwards ( save it into file ? database ? stdout ? ) it will be easier to give better answer/direction, few ideas comes up to my mind like immutable infrastructure with Bash or Java to handle synchronization or is it a memory-bound issue and you need an objects pool to process the JSON responses.. might even be a job for fault tolerance Elixir)