While using python multiprocessing pool how many jobs are submitted?
How is it decided? Can we control it somehow? Like at most 10 jobs in the queue at most to reduce memory usage.
Assume that I have the backbone code written below: For each chrom and simulation I read the data as pandas dataframe.
(I thought that reading data before submiting the job would be better, to reduce I/O bound in the worker process)
Then I send the pandas dataframe to each worker to process it.
But it seems that lots of jobs are submitted than the number of jobs finalized and this is resulting in memory error.
numofProcesses = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=numofProcesses)
jobs=[]
all_result1={}
all_result2={}
def accumulate(result):
result1=result[0]
result2=result[1]
accumulate(resulst1,all_result1)
accumulate(resulst2,all_result2)
print('ACCUMULATE')
for each chr:
for each sim:
chrBased_simBased_df= readData(chr,sim)
jobs.append(pool.apply_async(func, args=(chrBased_simBased_df,too,many,),callback=accumulate))
print('Submitted job:%d' %(len(jobs)))
pool.close()
pool.join()
Is there a way to get rid of it?