L

#### Lin Han

##### Guest

Code:

```
import numpy as np
from numba import njit, prange
from mpi4py import MPI
#####MPI setting#####
comm=MPI.COMM_WORLD
rank=comm.Get_rank()
size=comm.Get_size()
N_theta_to_scan=1000
n_per_proc=int(N_theta_to_scan/size)
n_more=int(N_theta_to_scan%size)
if rank<n_more:
start=rank*(n_per_proc+1)
number_to_cal=n_per_proc+1
else:
start=n_more*(n_per_proc+1)+(rank-n_more)*n_per_proc
number_to_cal=n_per_proc
#####function#####
@njit(parallel=True)
def func1(na,nb,nc):
to_sum=np.zeros(na*nb*nc)
for a in prange(0,na):
for b in prange(0,nb):
for c in prange(0,nc):
to_sum[a*nb*nc+b*nc+c]=a*b*c
out=np.sum(to_sum)
return out
@njit(parallel=True)
def func2(start,number_to_cal):
to_sum=np.zeros(number_to_cal)
for i in prange(start,start+number_to_cal):
to_sum[i-start]=func1(i,i*i,i*i*i)
out2=np.sum(to_sum)
return out2
#####main section#####
to_be_gather=np.array([func2(start,number_to_cal)])
gatheres=np.zeros(0)
comm.Reduce(to_be_gather,gatheres,op=MPI.SUM)
```

The error occurs for this segment of the code:

Code:

```
MemoryError: Allocation failed (probably too large).
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/share/workspace/wuliang/hanlin/test/test.py", line 38, in <module>
to_be_gather=func2(start,number_to_cal)
^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: CPUDispatcher(<function func2 at 0x2aed19d7b420>) returned a result with an exception set
```

Next, I attempted to remove

`func1()`

and modified `func2()`

by replacing `to_sum[i-start]=func1(i,i*i,i*i*i)`

with `to_sum[i-start]=i`

, which resulted in successful execution. Additionally, I tried running the code with only rank 0 executing the main function, i.e., running the code under the condition `if rank==0:`

, which also ran successfully. I'm wondering where the error lies, and what modifications should I make to achieve the functionality of the current code?<p>I'm attempting to accelerate a piece of Python code using Numba. Within this code, I'm also employing mpi4py for parallelization. However, I've encountered an error. I've attempted to provide a minimal error-reproducing example below:</p>

<pre><code>import numpy as np

from numba import njit, prange

from mpi4py import MPI

#####MPI setting#####

comm=MPI.COMM_WORLD

rank=comm.Get_rank()

size=comm.Get_size()

N_theta_to_scan=1000

n_per_proc=int(N_theta_to_scan/size)

n_more=int(N_theta_to_scan%size)

if rank<n_more:

start=rank*(n_per_proc+1)

number_to_cal=n_per_proc+1

else:

start=n_more*(n_per_proc+1)+(rank-n_more)*n_per_proc

number_to_cal=n_per_proc

#####function#####

@njit(parallel=True)

def func1(na,nb,nc):

to_sum=np.zeros(na*nb*nc)

for a in prange(0,na):

for b in prange(0,nb):

for c in prange(0,nc):

to_sum[a*nb*nc+b*nc+c]=a*b*c

out=np.sum(to_sum)

return out

@njit(parallel=True)

def func2(start,number_to_cal):

to_sum=np.zeros(number_to_cal)

for i in prange(start,start+number_to_cal):

to_sum[i-start]=func1(i,i*i,i*i*i)

out2=np.sum(to_sum)

return out2

#####main section#####

to_be_gather=np.array([func2(start,number_to_cal)])

gatheres=np.zeros(0)

comm.Reduce(to_be_gather,gatheres,op=MPI.SUM)

</code></pre>

<p>The error occurs for this segment of the code:</p>

<pre><code>MemoryError: Allocation failed (probably too large).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/share/workspace/wuliang/hanlin/test/test.py", line 38, in <module>

to_be_gather=func2(start,number_to_cal)

^^^^^^^^^^^^^^^^^^^^^^^^^^

SystemError: CPUDispatcher(<function func2 at 0x2aed19d7b420>) returned a result with an exception set

</code></pre>

<p>Next, I attempted to remove <code>func1()</code> and modified <code>func2()</code> by replacing <code>to_sum[i-start]=func1(i,i*i,i*i*i)</code> with <code>to_sum[i-start]=i</code>, which resulted in successful execution. Additionally, I tried running the code with only rank 0 executing the main function, i.e., running the code under the condition <code>if rank==0:</code>, which also ran successfully. I'm wondering where the error lies, and what modifications should I make to achieve the functionality of the current code?</p>