Physics-informed Neural Networks | Matteo Gätzner

Summary

Physics-informed neural networks (PINNs) are a class of deep learning models that incorporate physical laws—typically expressed as partial differential equations (PDEs)—directly into the training process. Instead of relying solely on data, PINNs leverage automatic differentiation to enforce that the neural network outputs satisfy the governing equations of the system.

General Recipe

Let \(\theta \in \Theta\) denote the parameters of a neural network, and let \(u_\theta : \mathcal{Y} \to \mathbb{R}^d\) be a deep neural network (DNN) that approximates the true physical field \(u : \mathcal{Y} \to \mathbb{R}^d\), which is an (unknown) solution to a PDE defined on the domain \(\mathcal{Y} \subset \mathbb{R}^m\).

We aim to find parameters \(\theta^\ast\) such that \(u_\theta \approx u\). Formally, this can be expressed as

\[ \theta^\ast \in \arg\min_{\theta \in \Theta} |u_\theta - u|_p, \]

where the error norm is defined as

\[ |u_\theta - u|_p \coloneqq \left( \int_\mathcal{Y} |u(y) - u_\theta(y)|^p \; dy \right)^{1/p}. \]

Since the true solution \(u\) is generally unknown, we cannot compute this loss directly. However, we typically know that \(u\) satisfies a PDE of the form

\[ \mathcal{D}(u(y)) = f(y), \quad y \in \mathcal{Y}, \]

where \(\mathcal{D}\) is a (possibly nonlinear) differential operator, and \(f : \mathcal{Y} \to \mathbb{R}^d\) is a known source term.

Because \(u_\theta\) is differentiable with respect to its inputs, we can use automatic differentiation (AD) to compute \(\mathcal{D}(u_\theta(y))\). This allows us to define the PDE residual

\[ \mathcal{R}_\theta(y) \coloneqq \mathcal{D}(u_\theta(y)) - f(y), \]

for all \(y \in \mathcal{Y}\).

To ensure that \(u_\theta\) satisfies the PDE, we minimize the expected residual:

\[ \mathcal{L}_{\text{PDE}}(\theta) \coloneqq \int_\mathcal{Y} |\mathcal{R}_\theta(y)|^p \; dy. \]

In practice, this integral is approximated by a numerical quadrature over \(N \in \mathbb{N}\) collocation points \(\{y_i\}_{i=1}^N\), yielding

\[ \mathcal{L}_{\text{PDE}}(\theta) \approx \sum_{i=1}^N w_i , |\mathcal{R}_\theta(y_i)|^p, \]

where \(w_i > 0\) are quadrature weights (commonly \(w_i = 1/N\)) and typically \(p=2\) (mean squared error). The network parameters \(\theta\) are then optimized via gradient-based methods such as Adam.

Incorporating Data and Boundary Conditions

In many physical systems, we have initial, boundary, or measurement data that the solution must satisfy. Let:

\(t \geq 0\) denote time,
\(x \in \mathcal{X}\) spatial coordinates, and
\(y \in \mathcal{Y}\) represent parameters or input conditions.

Then the PINN is a function

\[ u_\theta : \mathbb{R}_{\geq 0} \times \mathcal{X} \times \mathcal{Y} \to \mathbb{R}^d, \quad (t, x, y) \mapsto u_\theta(t, x, y). \]

Initial Conditions

If the true solution satisfies an initial condition

\[ \begin{align} u(0, x_1, y_1) &= g(x_1, y_1) \\ &\vdots \\ u(0, x_{N_{\text{IC}}}, y_{N_{\text{IC}}}) &= g(x_{N_{\text{IC}}}, y_{N_{\text{IC}}}) \\ \end{align} \]

we can enforce this constraint by adding an initial loss term

\[ \mathcal{L}_{\text{IC}}(\theta) = \frac{1}{N_{\text{IC}}} \sum_{i=1}^{N_{\text{IC}}} |u_\theta(0, x_i, y_i) - g(x_i, y_i)|^2. \]

Boundary Conditions

Boundary conditions specify the behavior of the solution on the spatial boundary \(\partial\mathcal{X}\). They can be enforced by adding a loss term similar to the initial condition loss. For example, if the solution satisfies a boundary condition \(u(x, y) = h(x, y)\) for \((x, y) \in \partial\mathcal{X} \times \mathcal{Y}\), we define

\[ \mathcal{L}_{\text{BC}}(\theta) = \frac{1}{N_{\text{BC}}} \sum_{i=1}^{N_{\text{BC}}} |u_\theta(x_i, y_i) - h(x_i, y_i)|^2. \]

Observational Data

If we have observed data points \({(t_i, x_i, y_i, u_i)}_{i=1}^{N_{\text{data}}}\), we can include a data loss

\[ \mathcal{L}_{\text{data}}(\theta) = \frac{1}{N_{\text{data}}} \sum_{i=1}^{N_{\text{data}}} |u_\theta(t_i, x_i, y_i) - u_i|^2. \]

Full Training Objective

The total PINN loss typically combines the PDE, initial, and data losses:

\[ \mathcal{L}(\theta) = \lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}}(\theta) + \lambda_{\text{IC}} \mathcal{L}_{\text{IC}}(\theta) + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}}(\theta) + \lambda_{\text{data}} \mathcal{L}_{\text{data}}(\theta), \]

where the weights \(\lambda_{\text{PDE}}, \lambda_{\text{IC}}, \lambda_{\text{BC}}, \lambda_{\text{data}} > 0\) balance the contributions of each term.

The network parameters are then optimized by minimizing the loss function:

\[ \hat{\theta} = \arg\min_{\theta \in \Theta} \mathcal{L}(\theta), \]

where \(\hat{\theta}\) denotes the parameters obtained by numerical optimization (e.g., using Adam).

Acknowledgment

This post is based on material covered in the AI in the Sciences and Engineering (HS 2025) lecture by Prof. Dr. Siddhartha Mishra, Computational and Applied Mathematics Laboratory (CamLab), Seminar for Applied Mathematics (SAM), D-MATH, ETH AI Center, and Swiss National AI Institute (SNAI), ETH Zürich, Switzerland.