OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Physics-Informed Neural Network is not learning the underlying physics problem at all

  • Thread starter Thread starter Lukas Hierzegger
  • Start date Start date
L

Lukas Hierzegger

Guest
I am trying to train a Fully Connected Neural Network (FCN) to learn the behavior of a Ordinary Differential Equation (ODE).

Initial Problem​


My ODE is describing the behavior of a heating system and is as follows:

Code:
# Modell für die Differentialgleichung
def model(t, v_I):
   q_H = (q_max / (u_0 * c_I)) * u if t >= T_0 else 0
   dvi_dt = (-alpha * A / c_I) * (v_I - v_A) + q_H
   return dvi_dt

As Initial conditions I am using v_I_0 = 0. My parameters are:

Code:
# Parameter
alpha = 3  # W/m^2K
A = 20     # m^2
c_I = 61250  # J/K
q_max = 1000  # W
u_0 = 200  # V
T_0 = 300  # s
v_I0 = 0  # Celcius

My disturbance value is v_A_tand will be constant for now and my control input is u_t (which is the voltage of my heating system in V) and will also be constant for now:

Code:
# Störgröße v_A(t) in Celcius
v_A_t = -15
# Kontrollgröße u(t) in V
u_t = 200.0

However, in order to obtain differentiability I changed the q_H term to:

Code:
# Glatte Version von q_H(t) mit tanh
def smooth_q_H(t, T_0, q_max, c_0, c_I, c_constant):
    epsilon = 10.0  # Glättungsparameter, kann angepasst werden
    return (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(epsilon * (t - T_0)) + 1))

If I solve the function with solve_ivpfrom scipy.integrateI get the following plot: enter image description here

Training setup​


Next, I defined a Fully Connected Neural Network with 1 input layer, 3 hidden layers with 32 neurons each and 1 output layer. Input shall be my time t, output shall be my v_I. As activation function I use tanh. I also use a XAVIER weight initialization.

I use an ADAM-optimizer with learning rate 0,001 and I have a lambda-value to regulate the impact of my loss later on.

My loss function is described as follows:

Code:
# Funktion zur Berechnung der Verluste
def calculate_losses():
    # Initial loss
    v_I_0 = pinn(t_boundary)
    loss1 = (torch.squeeze(v_I_0) - 0)**2

    # Physics loss
    v_I_pinn = pinn(t_physics)
    dvIdt_pinn = torch.autograd.grad(v_I_pinn, t_physics, torch.ones_like(v_I_pinn), create_graph=True)[0]
    residual = dvIdt_pinn - ((alpha * A) / c_I) * (v_A_constant - v_I_pinn) - (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(10.0 * (t_physics - T_0)) + 1))
    loss2 = torch.mean(residual ** 2)

    loss = loss1 + lambda1 * loss2
    return loss, loss1, loss2, v_I_0

First, I compute my initial boundary loss (loss1). As stated above, I wish to have v_I_0 = 0. For my initial loss, I define an initial t-set, which is basically consisting just one point (0). I compute v_I_0 by just passing the 0 t-point to the network and receiving the output of it.

Next, I compute my physics loss (loss2). In order to do that, I rearrange my ODE to 0. I store that value in the variable residual(spoiler: I feel like here must be a type-casting-error or similar, however: the ODE should not be the problem). Here, I compute the residual over my t_physics-tensor. That one is defined as follows:

Code:
# define training points over the entire domain, for the physics loss
N_col = 300
t_physics = torch.linspace(1, 4500, N_col, device=device).view(-1, 1).requires_grad_(True)

After, I return the values loss, loss1, loss2, v_I_0, mostly for logging-purposes though.

Training​


The actual training is quite straight-forward: I just let it run for a certain amount of iterations. Usually, I cancel the training though after I feel like there is no improvement. The longest I let the FCN train was about 8 hours with CUDA.

Code:
# Initialisierung der Verluste
loss1, loss2 = 0, 0

for i in tqdm(range(500001), desc="Processing"):
    loss, loss1, loss2, v_I_0 = calculate_losses()
    optimiser.zero_grad()
    loss.backward()
    optimiser.step()

Training progress​


I plot the training process every 5.000 iterations. Within my plot I display my t_physics-point-set (green dots), the output of my network (green line), the output of the solver (the true solution of the ODE) (grey line) and my residual (blue line).

Below, I share some training process picture, with lambda1 = 1e3 and a learning rate for my ADAM optimizer of 0.001.

Initializaton: enter image description here

10.000 iterations enter image description here

20.000 iterations enter image description here

50.000 iterations enter image description here

As you can see, I am no-where near my true solution of the ODE. What irritates me most though, is that the plot of my residual seems to 0 almost the entire time. The goal of the network should be to learn v_I and get dVIdt via automatic differentiation in order to make my residual go 0. Why is it 0 from the very beginning - especially if I can see how wrong my network is at the very beginning? I really don't get it...

Weirdly, my approach is working perfectly fine for the Van-der-Pol Oscillator, another ODE described problem. I am really hitting a rock here - I don't know what to do. I am happy about every tip you guys might have!

I played around with the lambda1value so far, but did not achieve a significant improvement. However, I managed to get a bigger curvature (as you can see slightly in the picture of 20.000 iterations and 50.000 iterations.

I also tried different optimizers. I tried ADAM (with learning rate 0,0035 and 0,001 and even changing after some iterations), I tried L-BFGS, I tried ADAGRAD and I tried a combination with ADAM first and L-BFGS after.

I revised the ODE several times, but couldn't find an error.

If you are interested, you can find the notebook here.

Please note, that the notebook is written in German.
<p>I am trying to train a Fully Connected Neural Network (FCN) to learn the behavior of a Ordinary Differential Equation (ODE).</p>
<h2>Initial Problem</h2>
<p>My ODE is describing the behavior of a heating system and is as follows:</p>
<pre><code># Modell für die Differentialgleichung
def model(t, v_I):
q_H = (q_max / (u_0 * c_I)) * u if t >= T_0 else 0
dvi_dt = (-alpha * A / c_I) * (v_I - v_A) + q_H
return dvi_dt
</code></pre>
<p>As Initial conditions I am using <code>v_I_0 = 0</code>. My parameters are:</p>
<pre><code># Parameter
alpha = 3 # W/m^2K
A = 20 # m^2
c_I = 61250 # J/K
q_max = 1000 # W
u_0 = 200 # V
T_0 = 300 # s
v_I0 = 0 # Celcius
</code></pre>
<p>My disturbance value is <code>v_A_t</code>and will be constant for now and my control input is <code>u_t</code> (which is the voltage of my heating system in V) and will also be constant for now:</p>
<pre><code># Störgröße v_A(t) in Celcius
v_A_t = -15
# Kontrollgröße u(t) in V
u_t = 200.0
</code></pre>
<p>However, in order to obtain differentiability I changed the <code>q_H</code> term to:</p>
<pre><code># Glatte Version von q_H(t) mit tanh
def smooth_q_H(t, T_0, q_max, c_0, c_I, c_constant):
epsilon = 10.0 # Glättungsparameter, kann angepasst werden
return (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(epsilon * (t - T_0)) + 1))
</code></pre>
<p>If I solve the function with <code>solve_ivp</code>from <code>scipy.integrate</code>I get the following plot:
<a href="https://i.sstatic.net/WqLMQIwX.png" rel="nofollow noreferrer">enter image description here</a></p>
<h2>Training setup</h2>
<p>Next, I defined a Fully Connected Neural Network with 1 input layer, 3 hidden layers with 32 neurons each and 1 output layer. Input shall be my time <code>t</code>, output shall be my <code>v_I</code>. As activation function I use <code>tanh</code>. I also use a XAVIER weight initialization.</p>
<p>I use an ADAM-optimizer with learning rate 0,001 and I have a lambda-value to regulate the impact of my loss later on.</p>
<p>My loss function is described as follows:</p>
<pre><code># Funktion zur Berechnung der Verluste
def calculate_losses():
# Initial loss
v_I_0 = pinn(t_boundary)
loss1 = (torch.squeeze(v_I_0) - 0)**2

# Physics loss
v_I_pinn = pinn(t_physics)
dvIdt_pinn = torch.autograd.grad(v_I_pinn, t_physics, torch.ones_like(v_I_pinn), create_graph=True)[0]
residual = dvIdt_pinn - ((alpha * A) / c_I) * (v_A_constant - v_I_pinn) - (q_max / (c_0 * c_I)) * c_constant * (0.5 * (torch.tanh(10.0 * (t_physics - T_0)) + 1))
loss2 = torch.mean(residual ** 2)

loss = loss1 + lambda1 * loss2
return loss, loss1, loss2, v_I_0
</code></pre>
<p>First, I compute my <strong>initial boundary loss</strong> (<code>loss1</code>). As stated above, I wish to have <code>v_I_0 = 0</code>. For my initial loss, I define an initial t-set, which is basically consisting just one point (0). I compute <code>v_I_0</code> by just passing the <code>0</code> t-point to the network and receiving the output of it.</p>
<p>Next, I compute my <strong>physics loss</strong> (<code>loss2</code>). In order to do that, I rearrange my ODE to 0. I store that value in the variable <code>residual</code>(spoiler: I feel like here must be a type-casting-error or similar, however: the ODE should not be the problem).
Here, I compute the residual over my <code>t_physics</code>-tensor. That one is defined as follows:</p>
<pre><code># define training points over the entire domain, for the physics loss
N_col = 300
t_physics = torch.linspace(1, 4500, N_col, device=device).view(-1, 1).requires_grad_(True)
</code></pre>
<p>After, I return the values <code>loss</code>, <code>loss1</code>, <code>loss2</code>, <code>v_I_0</code>, mostly for logging-purposes though.</p>
<h2>Training</h2>
<p>The actual training is quite straight-forward: I just let it run for a certain amount of iterations. Usually, I cancel the training though after I feel like there is no improvement. The longest I let the FCN train was about 8 hours with CUDA.</p>
<pre><code># Initialisierung der Verluste
loss1, loss2 = 0, 0

for i in tqdm(range(500001), desc="Processing"):
loss, loss1, loss2, v_I_0 = calculate_losses()
optimiser.zero_grad()
loss.backward()
optimiser.step()
</code></pre>
<h2>Training progress</h2>
<p>I plot the training process every 5.000 iterations. Within my plot I display my t_physics-point-set (green dots), the output of my network (green line), the output of the solver (the true solution of the ODE) (grey line) and my residual (blue line).</p>
<p>Below, I share some training process picture, with <code>lambda1 = 1e3</code> and a learning rate for my ADAM optimizer of <code>0.001</code>.</p>
<p><em>Initializaton:</em>
<a href="https://i.sstatic.net/gwbJPSnI.png" rel="nofollow noreferrer">enter image description here</a></p>
<p><em>10.000 iterations</em>
<a href="https://i.sstatic.net/0kA518UC.png" rel="nofollow noreferrer">enter image description here</a></p>
<p><em>20.000 iterations</em>
<a href="https://i.sstatic.net/AwHPIZ8J.png" rel="nofollow noreferrer">enter image description here</a></p>
<p><em>50.000 iterations</em>
<a href="https://i.sstatic.net/6Lb8AFBM.png" rel="nofollow noreferrer">enter image description here</a></p>
<p>As you can see, I am no-where near my true solution of the ODE. What irritates me most though, is that the plot of my residual seems to 0 almost the entire time. The goal of the network should be to learn <code>v_I</code> and get <code>dVIdt</code> via automatic differentiation in order to make my <code>residual</code> go 0. Why is it 0 from the very beginning - especially if I can see how wrong my network is at the very beginning? I really don't get it...</p>
<p>Weirdly, my approach is working perfectly fine for the Van-der-Pol Oscillator, another ODE described problem. I am really hitting a rock here - I don't know what to do. I am happy about every tip you guys might have!</p>
<p>I played around with the <code>lambda1</code>value so far, but did not achieve a significant improvement. However, I managed to get a bigger curvature (as you can see slightly in the picture of 20.000 iterations and 50.000 iterations.</p>
<p>I also tried different optimizers. I tried ADAM (with learning rate 0,0035 and 0,001 and even changing after some iterations), I tried L-BFGS, I tried ADAGRAD and I tried a combination with ADAM first and L-BFGS after.</p>
<p>I revised the ODE several times, but couldn't find an error.</p>
<p>If you are interested, you can find the notebook <a href="https://colab.research.google.com/drive/1OQl22996hWc0g8J9mgVBO1PKgSvYmjPT?usp=sharing" rel="nofollow noreferrer">here</a>.</p>
<p>Please note, that the notebook is written in German.</p>
 

Latest posts

I
Replies
0
Views
1
impact christian
I
Top