L

#### Lukas Hierzegger

##### Guest

## Initial Problem

My ODE is describing the behavior of a heating system and is as follows:

Code:

```
# Modell für die Differentialgleichung
def model(t, v_I):
q_H = (q_max / (u_0 * c_I)) * u if t >= T_0 else 0
dvi_dt = (-alpha * A / c_I) * (v_I - v_A) + q_H
return dvi_dt
```

As Initial conditions I am using

`v_I_0 = 0`

. My parameters are:
Code:

```
# Parameter
alpha = 3 # W/m^2K
A = 20 # m^2
c_I = 61250 # J/K
q_max = 1000 # W
u_0 = 200 # V
T_0 = 300 # s
v_I0 = 0 # Celcius
```

My disturbance value is

`v_A_t`

and will be constant for now and my control input is `u_t`

(which is the voltage of my heating system in V) and will also be constant for now:
Code:

```
# Störgröße v_A(t) in Celcius
v_A_t = -15
# Kontrollgröße u(t) in V
u_t = 200.0
```

However, in order to obtain differentiability I changed the

`q_H`

term to:
Code:

```
# Glatte Version von q_H(t) mit tanh
def smooth_q_H(t, T_0, q_max, c_0, c_I, c_constant):
epsilon = 10.0 # Glättungsparameter, kann angepasst werden
return (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(epsilon * (t - T_0)) + 1))
```

If I solve the function with

`solve_ivp`

from `scipy.integrate`

I get the following plot: enter image description here## Training setup

Next, I defined a Fully Connected Neural Network with 1 input layer, 3 hidden layers with 32 neurons each and 1 output layer. Input shall be my time

`t`

, output shall be my `v_I`

. As activation function I use `tanh`

. I also use a XAVIER weight initialization.I use an ADAM-optimizer with learning rate 0,001 and I have a lambda-value to regulate the impact of my loss later on.

My loss function is described as follows:

Code:

```
# Funktion zur Berechnung der Verluste
def calculate_losses():
# Initial loss
v_I_0 = pinn(t_boundary)
loss1 = (torch.squeeze(v_I_0) - 0)**2
# Physics loss
v_I_pinn = pinn(t_physics)
dvIdt_pinn = torch.autograd.grad(v_I_pinn, t_physics, torch.ones_like(v_I_pinn), create_graph=True)[0]
residual = dvIdt_pinn - ((alpha * A) / c_I) * (v_A_constant - v_I_pinn) - (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(10.0 * (t_physics - T_0)) + 1))
loss2 = torch.mean(residual ** 2)
loss = loss1 + lambda1 * loss2
return loss, loss1, loss2, v_I_0
```

First, I compute my

**initial boundary loss**(

`loss1`

). As stated above, I wish to have `v_I_0 = 0`

. For my initial loss, I define an initial t-set, which is basically consisting just one point (0). I compute `v_I_0`

by just passing the `0`

t-point to the network and receiving the output of it.Next, I compute my

**physics loss**(

`loss2`

). In order to do that, I rearrange my ODE to 0. I store that value in the variable `residual`

(spoiler: I feel like here must be a type-casting-error or similar, however: the ODE should not be the problem). Here, I compute the residual over my `t_physics`

-tensor. That one is defined as follows:
Code:

```
# define training points over the entire domain, for the physics loss
N_col = 300
t_physics = torch.linspace(1, 4500, N_col, device=device).view(-1, 1).requires_grad_(True)
```

After, I return the values

`loss`

, `loss1`

, `loss2`

, `v_I_0`

, mostly for logging-purposes though.## Training

The actual training is quite straight-forward: I just let it run for a certain amount of iterations. Usually, I cancel the training though after I feel like there is no improvement. The longest I let the FCN train was about 8 hours with CUDA.

Code:

```
# Initialisierung der Verluste
loss1, loss2 = 0, 0
for i in tqdm(range(500001), desc="Processing"):
loss, loss1, loss2, v_I_0 = calculate_losses()
optimiser.zero_grad()
loss.backward()
optimiser.step()
```

## Training progress

I plot the training process every 5.000 iterations. Within my plot I display my t_physics-point-set (green dots), the output of my network (green line), the output of the solver (the true solution of the ODE) (grey line) and my residual (blue line).

Below, I share some training process picture, with

`lambda1 = 1e3`

and a learning rate for my ADAM optimizer of `0.001`

.*Initializaton:*enter image description here

*10.000 iterations*enter image description here

*20.000 iterations*enter image description here

*50.000 iterations*enter image description here

As you can see, I am no-where near my true solution of the ODE. What irritates me most though, is that the plot of my residual seems to 0 almost the entire time. The goal of the network should be to learn

`v_I`

and get `dVIdt`

via automatic differentiation in order to make my `residual`

go 0. Why is it 0 from the very beginning - especially if I can see how wrong my network is at the very beginning? I really don't get it...Weirdly, my approach is working perfectly fine for the Van-der-Pol Oscillator, another ODE described problem. I am really hitting a rock here - I don't know what to do. I am happy about every tip you guys might have!

I played around with the

`lambda1`

value so far, but did not achieve a significant improvement. However, I managed to get a bigger curvature (as you can see slightly in the picture of 20.000 iterations and 50.000 iterations.I also tried different optimizers. I tried ADAM (with learning rate 0,0035 and 0,001 and even changing after some iterations), I tried L-BFGS, I tried ADAGRAD and I tried a combination with ADAM first and L-BFGS after.

I revised the ODE several times, but couldn't find an error.

If you are interested, you can find the notebook here.

Please note, that the notebook is written in German.

<p>I am trying to train a Fully Connected Neural Network (FCN) to learn the behavior of a Ordinary Differential Equation (ODE).</p>

<h2>Initial Problem</h2>

<p>My ODE is describing the behavior of a heating system and is as follows:</p>

<pre><code># Modell für die Differentialgleichung

def model(t, v_I):

q_H = (q_max / (u_0 * c_I)) * u if t >= T_0 else 0

dvi_dt = (-alpha * A / c_I) * (v_I - v_A) + q_H

return dvi_dt

</code></pre>

<p>As Initial conditions I am using <code>v_I_0 = 0</code>. My parameters are:</p>

<pre><code># Parameter

alpha = 3 # W/m^2K

A = 20 # m^2

c_I = 61250 # J/K

q_max = 1000 # W

u_0 = 200 # V

T_0 = 300 # s

v_I0 = 0 # Celcius

</code></pre>

<p>My disturbance value is <code>v_A_t</code>and will be constant for now and my control input is <code>u_t</code> (which is the voltage of my heating system in V) and will also be constant for now:</p>

<pre><code># Störgröße v_A(t) in Celcius

v_A_t = -15

# Kontrollgröße u(t) in V

u_t = 200.0

</code></pre>

<p>However, in order to obtain differentiability I changed the <code>q_H</code> term to:</p>

<pre><code># Glatte Version von q_H(t) mit tanh

def smooth_q_H(t, T_0, q_max, c_0, c_I, c_constant):

epsilon = 10.0 # Glättungsparameter, kann angepasst werden

return (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(epsilon * (t - T_0)) + 1))

</code></pre>

<p>If I solve the function with <code>solve_ivp</code>from <code>scipy.integrate</code>I get the following plot:

<a href="https://i.sstatic.net/WqLMQIwX.png" rel="nofollow noreferrer">enter image description here</a></p>

<h2>Training setup</h2>

<p>Next, I defined a Fully Connected Neural Network with 1 input layer, 3 hidden layers with 32 neurons each and 1 output layer. Input shall be my time <code>t</code>, output shall be my <code>v_I</code>. As activation function I use <code>tanh</code>. I also use a XAVIER weight initialization.</p>

<p>I use an ADAM-optimizer with learning rate 0,001 and I have a lambda-value to regulate the impact of my loss later on.</p>

<p>My loss function is described as follows:</p>

<pre><code># Funktion zur Berechnung der Verluste

def calculate_losses():

# Initial loss

v_I_0 = pinn(t_boundary)

loss1 = (torch.squeeze(v_I_0) - 0)**2

# Physics loss

v_I_pinn = pinn(t_physics)

dvIdt_pinn = torch.autograd.grad(v_I_pinn, t_physics, torch.ones_like(v_I_pinn), create_graph=True)[0]

residual = dvIdt_pinn - ((alpha * A) / c_I) * (v_A_constant - v_I_pinn) - (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(10.0 * (t_physics - T_0)) + 1))

loss2 = torch.mean(residual ** 2)

loss = loss1 + lambda1 * loss2

return loss, loss1, loss2, v_I_0

</code></pre>

<p>First, I compute my <strong>initial boundary loss</strong> (<code>loss1</code>). As stated above, I wish to have <code>v_I_0 = 0</code>. For my initial loss, I define an initial t-set, which is basically consisting just one point (0). I compute <code>v_I_0</code> by just passing the <code>0</code> t-point to the network and receiving the output of it.</p>

<p>Next, I compute my <strong>physics loss</strong> (<code>loss2</code>). In order to do that, I rearrange my ODE to 0. I store that value in the variable <code>residual</code>(spoiler: I feel like here must be a type-casting-error or similar, however: the ODE should not be the problem).

Here, I compute the residual over my <code>t_physics</code>-tensor. That one is defined as follows:</p>

<pre><code># define training points over the entire domain, for the physics loss

N_col = 300

t_physics = torch.linspace(1, 4500, N_col, device=device).view(-1, 1).requires_grad_(True)

</code></pre>

<p>After, I return the values <code>loss</code>, <code>loss1</code>, <code>loss2</code>, <code>v_I_0</code>, mostly for logging-purposes though.</p>

<h2>Training</h2>

<p>The actual training is quite straight-forward: I just let it run for a certain amount of iterations. Usually, I cancel the training though after I feel like there is no improvement. The longest I let the FCN train was about 8 hours with CUDA.</p>

<pre><code># Initialisierung der Verluste

loss1, loss2 = 0, 0

for i in tqdm(range(500001), desc="Processing"):

loss, loss1, loss2, v_I_0 = calculate_losses()

optimiser.zero_grad()

loss.backward()

optimiser.step()

</code></pre>

<h2>Training progress</h2>

<p>I plot the training process every 5.000 iterations. Within my plot I display my t_physics-point-set (green dots), the output of my network (green line), the output of the solver (the true solution of the ODE) (grey line) and my residual (blue line).</p>

<p>Below, I share some training process picture, with <code>lambda1 = 1e3</code> and a learning rate for my ADAM optimizer of <code>0.001</code>.</p>

<p><em>Initializaton:</em>

<a href="https://i.sstatic.net/gwbJPSnI.png" rel="nofollow noreferrer">enter image description here</a></p>

<p><em>10.000 iterations</em>

<a href="https://i.sstatic.net/0kA518UC.png" rel="nofollow noreferrer">enter image description here</a></p>

<p><em>20.000 iterations</em>

<a href="https://i.sstatic.net/AwHPIZ8J.png" rel="nofollow noreferrer">enter image description here</a></p>

<p><em>50.000 iterations</em>

<a href="https://i.sstatic.net/6Lb8AFBM.png" rel="nofollow noreferrer">enter image description here</a></p>

<p>As you can see, I am no-where near my true solution of the ODE. What irritates me most though, is that the plot of my residual seems to 0 almost the entire time. The goal of the network should be to learn <code>v_I</code> and get <code>dVIdt</code> via automatic differentiation in order to make my <code>residual</code> go 0. Why is it 0 from the very beginning - especially if I can see how wrong my network is at the very beginning? I really don't get it...</p>

<p>Weirdly, my approach is working perfectly fine for the Van-der-Pol Oscillator, another ODE described problem. I am really hitting a rock here - I don't know what to do. I am happy about every tip you guys might have!</p>

<p>I played around with the <code>lambda1</code>value so far, but did not achieve a significant improvement. However, I managed to get a bigger curvature (as you can see slightly in the picture of 20.000 iterations and 50.000 iterations.</p>

<p>I also tried different optimizers. I tried ADAM (with learning rate 0,0035 and 0,001 and even changing after some iterations), I tried L-BFGS, I tried ADAGRAD and I tried a combination with ADAM first and L-BFGS after.</p>

<p>I revised the ODE several times, but couldn't find an error.</p>

<p>If you are interested, you can find the notebook <a href="https://colab.research.google.com/drive/1OQl22996hWc0g8J9mgVBO1PKgSvYmjPT?usp=sharing" rel="nofollow noreferrer">here</a>.</p>

<p>Please note, that the notebook is written in German.</p>