Why is the Python mulitprocessing using class functions slower than in serial for this code?

I am trying to run a multiprocessing pool within a class to calculate several values that use class functions of a much larger class. I am trying to take the initial values and add a random 5% normal distribution to each value and calculate the new log likelihood of those new values.

Here is a (somewhat useless) snippet of the code that I am using. make_model is very long but spits out a model for the velocity which I compare to the measured values stored in a data container.

import multiprocessing as mp

class fit_time_dependent():
        def __init__(self):
                setup a bunch of things here
   
        def make_model(self):
                a bunch of things here too
                return velocity_model

    def log_likelihood_pass1(self,pars):
        velocity_model = self.make_model(pars)

        totallogprob = 0
        if self.datum.velocities:
            for inst in self.datum.velocity_instruments:

                vsh_data = self.datum.get_velocity(inst)
                vsh_data_y = vsh_data["vsh"]*u.km/u.s
                vsh_data_y_err = vsh_data["vsh_err"]*u.km/u.s

                sigma2 = vsh_data_y_err ** 2# + model ** 2
                totallogprob += -0.5 * np.sum((vsh_data_y - velocity_model)**2/sigma2)

        return totallogprob.value

    def log_prob_pass1(self,pars):
        lp = self.log_prior(pars)
        if not np.isfinite(lp):
            return -np.inf
        return lp + self.log_likelihood_pass1(pars)

          def do_fit(self)

                p0 = some initial values from a previous fit of model to data

        nsize = 128
        spread = 0.05
        pos = np.array(p0) + spread * np.random.randn(nsize, len(p0))

        time_start_pool = time_counter.time()
        pool = mp.Pool(8)
        results_pool = pool.map(self.log_prob_pass1,pos)
        time_end_pool = time_counter.time()
        time_elapsed_pool = float(time_end_pool) - float(time_start_pool)
        print("Pool - map - %s seconds" % time_elapsed_pool)

        time_start_serial = time_counter.time()
        results_serial = np.asarray(list(map(self.log_prob_pass1,pos)))
        time_end_serial = time_counter.time()
        time_elapsed_serial = float(time_end_serial) - float(time_start_serial)
        print("Serial - map - %s seconds" % time_elapsed_serial)

The issue is that this calculation has to be repeated many time and running on a single core would take far too long.

When testing the code in pool vs serial, I get a huge performance hit for using pool.

Pool - map - 296.5006010532379 seconds
Serial - map - 17.647610187530518 seconds

Additionally, I was watching my CPU usage, and it seems that pool doesn’t use any of the cores that I requested in the pool:

CPU usage

I’ve tried to use pathos/multiprocess with their different Pool options like: ProcessPool, ParallelPool, ThreadPool. I would like to keep it simple and not have to use Process but if it comes to that fine.

This seems similar to my problem but not exactly: https://stackoverflow.com/questions/66790158/how-to-make-use-of-a-multiprocessing-manager-within-a-class

Thanks for the help.

You need to sign in to view this answers

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Why is the Python mulitprocessing using class functions slower than in serial for this code?

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

PostgreSQL how to merge rows where some fields match and others are null

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

Why is the Python mulitprocessing using class functions slower than in serial for this code?

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP