As a newbie to the programming, I'm still exploring the concepts of multiprocessing and multithreading.
I've written small script that reads a file and copies the files to multiple temporary folders and does the following actions on each folder.
- build a label.
- Generate an package
- Push it to the Nexus.
There are ~500 folders & are processed sequentially. How can I use multiprocessing here, thereby process 100 folders at a time parallel or increase the number still. Also, is it possible to keep track of those process and fail the build even one sub process fails.
I read multiple articles on multiprocessing, but couldn't wrap my head around it :(
Any guidance would be of great help to me, thanks.
folder1
-- war file
-- metadata
folder 2
-- war file
-- metadata
....
....
folder 500
-- war file
-- metadata
Code snippet
import re, shutil, os
from pathlib import Path
target = "/home/work"
file_path = target + "/file.txt"
dict = {}
count = 1
def commands_to_run_on_each_folder(filetype, tmp_folder):
target_folder = tmp_folder+'/tmp'+str(count)
os.system(<1st command to build the label>)
os.system(<2nd command to build the package>)
<multiple file manipulations, where `filetype` is used and get the required file with right extension>
<curl command to upload it to the Nexus>
#Read the text file and assemble it in a dictionary.
with open(file_path, 'r') as f:
lines = f.read().splitlines()
for i, line in enumerate(lines):
match = re.match(r".*.war", line)
if match:
j = i-1 if i > 1 else 0
for k in range(j, i):
dict[match.string] = lines[k]
#Iterate the dictionary and copy the folder to the temporary folders.
for key, value in dict.items():
os.mkdir(target+'/tmp'+str(count))
shutil.copy(key, target+'/tmp'+str(count))
shutil.copy(value, target+'/tmp'+str(count))
commands_to_run_on_each_folder("war", target)
count += 1
OS : Ubuntu 18.04 Memory : 22 GB container
concurrent.futureslibrary because it's pretty high level, and generally straightforward enough to use.dictis a keyword, please do not use it for variable name.