I am trying to run a multiprocessing pool within a class to calculate several values that use class functions of a much larger class. I am trying to take the initial values and add a random 5% normal distribution to each value and calculate the new log likelihood of those new values.
Here is a (somewhat useless) snippet of the code that I am using. make_model
is very long but spits out a model for the velocity which I compare to the measured values stored in a data container.
import multiprocessing as mp
class fit_time_dependent():
def __init__(self):
setup a bunch of things here
def make_model(self):
a bunch of things here too
return velocity_model
def log_likelihood_pass1(self,pars):
velocity_model = self.make_model(pars)
totallogprob = 0
if self.datum.velocities:
for inst in self.datum.velocity_instruments:
vsh_data = self.datum.get_velocity(inst)
vsh_data_y = vsh_data["vsh"]*u.km/u.s
vsh_data_y_err = vsh_data["vsh_err"]*u.km/u.s
sigma2 = vsh_data_y_err ** 2# + model ** 2
totallogprob += -0.5 * np.sum((vsh_data_y - velocity_model)**2/sigma2)
return totallogprob.value
def log_prob_pass1(self,pars):
lp = self.log_prior(pars)
if not np.isfinite(lp):
return -np.inf
return lp + self.log_likelihood_pass1(pars)
def do_fit(self)
p0 = some initial values from a previous fit of model to data
nsize = 128
spread = 0.05
pos = np.array(p0) + spread * np.random.randn(nsize, len(p0))
time_start_pool = time_counter.time()
pool = mp.Pool(8)
results_pool = pool.map(self.log_prob_pass1,pos)
time_end_pool = time_counter.time()
time_elapsed_pool = float(time_end_pool) - float(time_start_pool)
print("Pool - map - %s seconds" % time_elapsed_pool)
time_start_serial = time_counter.time()
results_serial = np.asarray(list(map(self.log_prob_pass1,pos)))
time_end_serial = time_counter.time()
time_elapsed_serial = float(time_end_serial) - float(time_start_serial)
print("Serial - map - %s seconds" % time_elapsed_serial)
The issue is that this calculation has to be repeated many time and running on a single core would take far too long.
When testing the code in pool vs serial, I get a huge performance hit for using pool.
Pool - map - 296.5006010532379 seconds
Serial - map - 17.647610187530518 seconds
Additionally, I was watching my CPU usage, and it seems that pool doesn’t use any of the cores that I requested in the pool:
I’ve tried to use pathos/multiprocess with their different Pool options like: ProcessPool, ParallelPool, ThreadPool. I would like to keep it simple and not have to use Process
but if it comes to that fine.
This seems similar to my problem but not exactly: https://stackoverflow.com/questions/66790158/how-to-make-use-of-a-multiprocessing-manager-within-a-class
Thanks for the help.
You need to sign in to view this answers