I have some code (let’s call it a function do_stuff(x)
) which makes calls to an external executable using subprocess.run
. That external executable needs to read and write local files with hardcoded names (I have no control over the way this was designed). Therefore, if I want to run more than one instance of that code at the same time, I need to actually copy that external executable and its related local files to a separate folder.
In the big picture I need to run do_stuff(x)
with around 50 different values of x
so I would like to parallelize the code that calls do_stuff
so the batch run takes less time.
I have a handle on how to copy the files I need to a new folder (either something I specify explicitly or by using tempfile
) and I have seen examples of various approaches to parallelization using multiprocessing
, joblib
, asyncio
, etc. (for example here) but the catch is that because of the constraint imposed by my external executable, a given call needs to know what worker it is so it can operate in the right folder.
So if, for example, I want 4 workers and I create folders "temp1", "temp2", temp3", and "temp4", I need whatever parallelization approach is to use all four workers and all four temp folders such that only one thing is trying to run in each temp folder at a time. Unless I’m missing something obvious (which is certainly possible), it seems like I would need to have each call to do_stuff
discover which worker it is so it can know what folder to work in. I think I might be able to do that by something like os.chdir(f'temp{multiprocessing.current_process()._identity[0]}')
but that seems rather like a hack that’s prone to breaking (e.g. if the parallelization approach is creating threads instead of processes).
I suppose I could have every single call to do_stuff
make its own copy, execute, and clean up after itself, but that seems like a lot of wasteful file system operations which won’t come for free either.
In case it’s relevant to recommending an approach, the call to the external executable takes roughly 1 second but do_stuff
typically calls that executable between 20 and 500 times as dictated by scipy.optimize.minimize
.
You need to sign in to view this answers