I want to investigate multiprocessing. I have 'tar' archive, let's say with 1000 files(indeed there are much more files) and each file has 1000 rows. I need to read each file and each line of the file. I need to return and save info about each file in some 'result' variable(dictionary). I have next code and for some unknown reason it stops after 8 iterations:
class DataProc():
...
def data_proc(self):
...
result = {}
read_mode = 'r'
self.tar = tarfile.open(file_path, read_mode)
for file in self.tar:
q = Queue()
p = Process(target=process_tar,
args=(file, q))
p.start()
tmp_result = q.get()
for key, item in tmp_result.items():
'''
do some logic and save data to result
'''
pass
p.join()
return result
def process_tar(self, file, q):
output = {}
extr_file = self.tar.extractfile(file)
content = extr_file.readlines()
'''
do some data processing with file content
save result to output
'''
q.put(output)
dp = DataProc()
result = dp.data_proc()
'for file in self.tar' make only 8 iterations. What am I do wrong?