Why is the Python mulitprocessing using class functions slower than in serial for this code?

I am trying to run a multiprocessing pool within a class to calculate several values that use class functions of a much larger class. I am trying to take the initial values and add a random 5% normal distribution to each value and calculate the new log likelihood of those new values.

Here is a (somewhat useless) snippet of the code that I am using. make_model is very long but spits out a model for the velocity which I compare to the measured values stored in a data container.

import multiprocessing as mp

class fit_time_dependent():
        def __init__(self):
                setup a bunch of things here
   
        def make_model(self):
                a bunch of things here too
                return velocity_model

    def log_likelihood_pass1(self,pars):
        velocity_model = self.make_model(pars)

        totallogprob = 0
        if self.datum.velocities:
            for inst in self.datum.velocity_instruments:

                vsh_data = self.datum.get_velocity(inst)
                vsh_data_y = vsh_data["vsh"]*u.km/u.s
                vsh_data_y_err = vsh_data["vsh_err"]*u.km/u.s

                sigma2 = vsh_data_y_err ** 2# + model ** 2
                totallogprob += -0.5 * np.sum((vsh_data_y - velocity_model)**2/sigma2)

        return totallogprob.value

    def log_prob_pass1(self,pars):
        lp = self.log_prior(pars)
        if not np.isfinite(lp):
            return -np.inf
        return lp + self.log_likelihood_pass1(pars)

          def do_fit(self)

                p0 = some initial values from a previous fit of model to data

        nsize = 128
        spread = 0.05
        pos = np.array(p0) + spread * np.random.randn(nsize, len(p0))

        time_start_pool = time_counter.time()
        pool = mp.Pool(8)
        results_pool = pool.map(self.log_prob_pass1,pos)
        time_end_pool = time_counter.time()
        time_elapsed_pool = float(time_end_pool) - float(time_start_pool)
        print("Pool - map - %s seconds" % time_elapsed_pool)

        time_start_serial = time_counter.time()
        results_serial = np.asarray(list(map(self.log_prob_pass1,pos)))
        time_end_serial = time_counter.time()
        time_elapsed_serial = float(time_end_serial) - float(time_start_serial)
        print("Serial - map - %s seconds" % time_elapsed_serial)

The issue is that this calculation has to be repeated many time and running on a single core would take far too long.

When testing the code in pool vs serial, I get a huge performance hit for using pool.

Pool - map - 296.5006010532379 seconds
Serial - map - 17.647610187530518 seconds

Additionally, I was watching my CPU usage, and it seems that pool doesn’t use any of the cores that I requested in the pool:

CPU usage

I’ve tried to use pathos/multiprocess with their different Pool options like: ProcessPool, ParallelPool, ThreadPool. I would like to keep it simple and not have to use Process but if it comes to that fine.

This seems similar to my problem but not exactly: https://stackoverflow.com/questions/66790158/how-to-make-use-of-a-multiprocessing-manager-within-a-class

Thanks for the help.

You need to sign in to view this answers

Related Post