I

#### Ishigami

##### Guest

Code:

```
import pandas as pd
data = {
"Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
"Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
"theta": [8,9,2,12,4,5,30,3,2,1,50]
}
df = pd.DataFrame(data)
```

And I would like to create a new column

`df['feature']`

by the following method: with each `Race_ID`

, suppose the `Student_ID`

is equal to i, then we define feature to be
Code:

```
def f(thetak, thetaj, thetai, *theta):
prod = 1;
for t in theta:
prod = prod * t;
return ((thetai + thetaj) / (thetai + thetaj + thetai * thetak)) * prod
```

where k,j,l are the

`Student_ID`

s within the same `Race_ID`

such that k =/= i, j=/=i,k, l=/=k,j,i and theta_i is `theta`

with `Student_ID`

equals to i. So for example for `Race_ID`

=2, `Student_ID`

=1, we have feature equals tof(2,3,1,4,5)+f(2,3,1,5,4)+f(2,4,1,3,5)+f(2,4,1,5,3)+f(2,5,1,3,4)+f(2,5,1,4,3)+f(3,2,1,4,5)+f(3,2,1,5,4)+f(3,4,1,2,5)+f(3,4,1,5,2)+f(3,5,1,2,4)+f(3,5,1,4,2)+f(4,2,1,3,5)+f(4,2,1,5,3)+f(4,3,1,2,5)+f(4,3,1,5,2)+f(4,5,1,2,3)+f(4,5,1,3,2)+f(5,2,1,3,4)+f(5,2,1,4,3)+f(5,3,1,2,4)+f(5,3,1,4,2)+f(5,4,1,2,3)+f(5,4,1,3,2)

which is equal to 299.1960138012742.

But as one quickly realises, the number of terms in the sum grows super exponentially with the number of students in a race: if there are n students in a race, then there are (n-1)! terms in the sum.

Fortunately, due to the symmetry property of f, we can reduce the number of terms to a mere (n-1)(n-2) terms by noting the following:

Let i,j,k be given and 1,2,3 (for example sake) be different from i,j,k (i.e. 1,2,3 is in *arg). Then f(k,j,i,1,2,3) = f(k,j,i,1,3,2) = f(k,j,i,2,1,3) = f(k,j,i,2,3,1) = f(k,j,i,3,1,2) = f(k,j,i,3,2,1). Hence we can reduce the number of terms if we just compute any one of the terms and then multiply it by (n-3)!

So for example, for

`Race_ID`

=5, `Student_ID`

=9, there would have been 5!=120 terms to sum, but using the above symmetry property, we only have to sum 5x4 = 20 terms (5 choices for k, 4 choices for i and 1 (non-unique choice) for l's), namelyf(2,3,9,5,6,10)+f(2,5,9,3,6,10)+f(2,6,9,3,5,10)+f(2,10,9,3,5,6)+f(3,2,9,5,6,10)+f(3,5,9,3,6,10)+f(3,6,9,2,5,10)+f(3,10,9,2,5,6)+f(5,2,9,3,6,10)+f(5,3,9,2,6,10)+f(5,6,9,2,3,10)+f(5,10,9,2,3,6)+f(6,2,9,3,5,10)+f(6,3,9,2,5,10)+f(6,5,9,2,3,10)+f(6,10,9,2,3,5)+f(10,2,9,3,5,6)+f(10,3,9,2,5,6)+f(10,5,9,2,3,6)+f(10,6,9,2,3,5)

and the feature for student 9 in race 5 will be equal to the above sum times 3! = 53435.8900666112

So by question is: how do i write the sum for the above dataframe? I have computed three of the features by hand for checking and the desired outcome looks like:

Code:

```
import pandas as pd
data = {
"Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
"Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
"theta": [8,9,2,12,4,5,30,3,2,1,50],
"feature": [299.1960138012742, 268.93506341257876, x, x, x, 53435.8900666112, x , x , x , x , x]
}
df = pd.DataFrame(data)
```

Thank you so much.