ODE-Discretization

ODE Discretization

This note gives a brief historical and conceptual overview of ODE discretization. In its early development (18th–19th century), discretization—most notably Euler’s method—was introduced as a numerical tool to approximate solution trajectories of continuous-time systems whose closed-form solutions were unavailable. In the 20th century, the focus shifted: discretization began to be understood not as trajectory simulation, but as a way to design iterative algorithms whose goal is to find equilibria (fixed points) and study their stability. This note traces this shift in perspective and highlights the mathematical intuition behind viewing Euler discretization simultaneously as a time-stepping scheme and a fixed-point iteration.

--- 18th–19th Century: Discretization for Trajectory Approximation¶

In ODEs, we usually work with continuous-time systems: $$ \dot{x}(t) = f(x(t)) $$ where

$x(t)$: system state (varies with time)
$f(x)$: system dynamics (specifies instantaneous direction of change).
$\dot{x}(t)$: time derivative, representing the rate of change

Here we regard the above ODE as a definition instead of a conclusion / computing result. The expression $\dot{x}(t) = f(x(t))$ means: we define a system $x(t)$, where the first-order derivative of it is defined as $f(x(t))$. And $f()$ is not something being derived; it is defined as part of the modeling process. For example, one can freely define $\dot{x}(t) = \log x(t)$, or $\dot{x}(t) = e^{x(t)}$ for modeling different systems. We choose, by design, "what direction the system changes when it is in state $x$"—this rule is $f()$.

The continuous-time ODE describes how the system evolves at each instant—its direction and speed. Typical applications include modeling physical systems (particle motion, damping), chemical reaction kinetics. What people are interested in, by that time, is the trajectory of the system $x(t)$, which can be obtained by solving the ODE.

Consider a simple nonlinear reaction kinetics model: $$ \dot{x}(t) = -x(t)^3 + x(t), $$ where $x(t)$ represents the concentration of a reactant, which is, obviously, what we care most about. It can be "solved", but the mathematical solution is given implicitly - no simple closed-form expression for $x(t)$ is available: $$ \dot{x} = x - x^3 = x(1-x^2). $$ This is a first-order autonomous, separable ODE. $$ \frac{dx}{x(1-x^2)} = dt $$ The left-hand side can be decomposed using partial fractions: $$ \frac{1}{x(1-x^2)} = \frac{1}{x} + \frac{1/2}{1-x} - \frac{1/2}{1+x} $$ Finally, integrating both sides gives $$ \log|x| - \frac12 \log|1-x^2| = t + C. $$ This is the best we can get by solving the ODE directly. With this final expression, we are able to obtain the value of $x(t)$ whenever the value of $t$ is known. But the process is painful: for every point of $t$, we plug in the value of $t$, and solve the equation $\log|x| - \frac12 \log|1-x^2| = \text{a number}$.

Of course, no one wants to solve a complicated equation at every value of $t$, and sometimes it's not even possible. If we can get a closed-form solution, like $x(t) = \text{(explicit function of } t\text{)}$, it would be ideal. And we can get the closed-form solution for some easy ODEs, like $\dot{x}(t) = \lambda x(t)$, the solution is $x(t) = x(0) e^{\lambda t}$. But unfortunately, for most ODEs, obtaining a closed-form solution is not possible, including this example. That's why we need to use discretization to approximate the trajectory of $x(t)$.

Euler Method¶

Given a first-order ODE as we defined before $$ \dot{x}(t) = f(x(t)) $$ Take a time step $\alpha > 0$ and define the discrete sequence $\{x_k\}_{k\ge0}$: $$ \boxed{ x_{k+1} = x_k + \alpha f(x_k) } $$ This is an approximation of the $x(t)$ trajectory, called the explicit Euler method. The Euler method was introduced by Leonhard Euler in the mid-18th century. Its original purpose was to simulate solution trajectories of differential equations that could not be solved in closed form.

The Euler method can be understood as a first-order approximation of the solution trajectory $x(t)$ in time. If $x(t)$ is the exact solution of the ODE $\dot{x}(t)=f(x(t))$, then by Taylor expansion, $$ x(t+\alpha) = x(t) + \alpha \dot{x}(t) + O(\alpha^2) = x(t) + \alpha f(x(t)) + O(\alpha^2). $$ The Euler update $$ x_{k+1} = x_k + \alpha f(x_k) $$ is obtained by discarding the higher-order terms and advancing the state using this first-order approximation. As a result, the local truncation error is $O(\alpha^2)$. To simulate until a fixed final time $T$, about $T/\alpha = O(1/\alpha)$ steps are needed, leading to a global error of order $O(\alpha)$.

Equivalently, the Euler method arises by approximating the time derivative with a forward finite difference. Since $$ \dot{x}(t) = \lim_{\alpha\to 0} \frac{x(t+\alpha)-x(t)}{\alpha}, $$ replacing the limit by a finite step yields $$ \frac{x_{k+1}-x_k}{\alpha} \approx f(x_k), $$ which leads to the same update rule. These two interpretations are mathematically equivalent: the forward difference approximation of the derivative corresponds exactly to truncating the Taylor expansion at first order.

In particular, the Euler method performs well under the following conditions:

The step size $\alpha$ is sufficiently small, so that higher-order terms in the Taylor expansion are negligible.
The vector field $f$ is Lipschitz continuous, i.e., $$ |f(x)-f(y)| \le L |x-y| \quad \text{for all } x,y, $$ which ensures that the dynamics do not change too rapidly and that solutions depend smoothly on the initial conditions.

Under these assumptions, the discrete iterates produced by the Euler method approximate the continuous solution sampled at discrete times: $$ x_k \approx x(k\alpha), \quad \text{as } \alpha \to 0. $$ This approximation holds over any finite time interval as the step size decreases.

--- 20th Century: Equilibrium-Driven Discretization¶

In the 20th century, it was realized that for many continuous-time systems, what really matters is whether or not the system converges, so stability and equilibria of systems became the central object of study. This shift was driven by stability theory (Lyapunov, LaSalle) and by applications where long-term behavior matters more than transient dynamics.

Consider an autonomous ODE $$ \dot{x}(t) = f(x(t)) $$ A point $x^\star$ is an equilibrium if $$ f(x^\star) = 0 $$ The equilibrium is (asymptotically) stable if solutions starting near $x^\star$ satisfy $$ x(t) \to x^\star \quad \text{as } t \to \infty $$ In this setting, we no longer care about the full trajectory $x(t)$, but only whether it converges and where it converges to.

There are two remarks worth clarifying:

Note that equilibrium ≠ stability. An equilibrium point can be unstable.
- An unstable equilibrium has an attraction basin that only includes itself—if perturbed, the system never returns. For example, consider the ODE $\dot{x}=x$. The point $x^\star=0$ is an equilibrium since $f(0)=0$. The solution is $x(t)=x(0)e^{t}$. Only if $x(0)=0$ does the solution stay at 0. If $x(0)>0$, then for all $t$, $x(t)$ grows to $+\infty$, never returning to $x^\star=0$. Similarly, if $x(0)<0$, $x(t)$ diverges to $-\infty$.
- A stable equilibrium is a limit point that attracts nearby trajectories, and the system gets closer to it as time evolves. For example, $\dot{x} = -x$. The point $x^\star=0$ is an equilibrium since $f(0)=0$. The solution is $x(t)=x(0)e^{-t}$. If $x(0)=0$, $x(t)$ stays at 0; if $x(0)\neq0$, then $x(t)$ approaches 0 as $t \to \infty$. It never reaches 0 exactly, but gets arbitrarily close.
A system can have multiple equilibria. When $t \to \infty$, which equilibrium the trajectory converges to is generally determined by the initial condition $x(0)$. Each stable equilibrium has its own basin of attraction, and the state will converge to the equilibrium whose basin contains the initial point.

Naturally, people expect that if a equilibrium of a continuous system is stable, a suitable discretization should also preserve this stability. We'll see in following notes about which condition is needed for a continuous system to be stable, and what is a "suitable" discretization to preserve the stability. At this stage, discretization remains a foundational mathematical tool but is now used to design convergent algorithms for finding equilibria, rather than just approximating the solution trajectory $x(t)$.

Stability Condition for Continuous Systems¶

This is not relevant to discretization, but we discuss the stability condition for a continuous system here, so that we can see how a proper discretization preserves this stability.

Consider the continuous system $\dot x = f(x)$, and let $x^\star$ be an equilibrium point, i.e. $f(x^\star)=0$. Define the error (or deviation) variable $$ y := x - x^\star $$ If we can show $y(t) \to 0 \text{ as } t \to \infty$, then it follows immediately that $x(t) \to x^\star$. Why we do this convertion and then shift the focus to $y$? Probably showing something converges to 0 always sounds easier. We'll see that the system $y$ has serveral nice properties to make the convergence analysis possible.

Notice that the error variable $y$ forms another continuous system. We can differentiate $y = x - x^\star$ with respect to time to give the dynamics of the system ($x^\star$ is constant) $$ \dot y = \dot x = f(x^\star + y) $$

Since $f(x^\star) = 0$, we can see $y=0$ is actually an equilibrium of the error system. Therefore, stability of $x^\star$ for the original system is equivalent to stability of $y=0$ for the error system.

Assume $f$ is continuously differentiable in a neighborhood of $x^\star$. Applying Taylor expansion of $f$ at $x^\star$, and use the fact $f(x^\star)=0$: $$ f(x^\star + y) = f(x^\star) + Df(x^\star)y + O(\|y\|^2) = Jy + O(\|y\|^2) $$ where $$J := Df(x^\star)$$ is the Jacobian matrix; and the term $O(\|y\|^2)$ means the same order of $\|y\|^2$, so when $y\rightarrow 0$, it satisfies $\lim_{y \to 0} \frac{O(\|y\|^2)}{\|y\|} = 0$. This means that near the equilibrium, $O(\|y\|^2)$ is of higher order than the linear term and becomes negligible compared to $Jy$.

Thus, when sufficiently close to the equilibrium, the error system dynamics can now be written as $$\dot y = Jy$$

This ODE has a closed-form solution: $$ y(t) = e^{tJ} y(0) $$ Therefore, $y(t) \to 0$ if and only if $\|e^{tJ}\| \to 0$ as $t \to \infty$. Bring $J$ to its Jordan form: $$ J = V \Lambda V^{-1} \quad \Rightarrow \quad e^{tJ} = V e^{t\Lambda} V^{-1}, \quad \text{where } e^{t\Lambda} = \mathrm{diag}(e^{\lambda_1 t},\dots,e^{\lambda_n t}) $$ So, $$ \|e^{tJ}\|\to 0 \iff |e^{\lambda_i t}|\to 0 , \forall i \iff \operatorname{Re}(\lambda_i)<0 , \forall i $$

Therefore, for a continuous system $\dot x = f(x)$, a equilibrium $x^\star$ is asymptotically stable iff for the Jacobian matrix $J = Df(x^\star)$, $$ \boxed{ \operatorname{Re}(\lambda_i(J)) < 0 \quad \text{for all eigenvalues } \lambda_i } $$

Discretization as Fixed-Point Iteration¶

The Euler method did not change! Apply the explicit Euler method with step size $\alpha > 0$: $$ \boxed{ x_{k+1} = x_k + \alpha f(x_k) } $$ In the 18th–19th century view, this was an approximation of $$ x(k\alpha) \approx x_k $$ In the 20th century view, this same update is reinterpreted as a fixed-point iteration. Define the discrete map $$ T(x) := x + \alpha f(x) $$ Then $$ x_{k+1} = T(x_k), $$ and fixed points of the iteration satisfy $$ x = T(x) \iff f(x) = 0 $$ Thus: Equilibria of the continuous system are fixed points of the discrete iteration.

Will Fixed-Point Iteration Converge To $x^*$?¶

We've seen that the equilibria of the continuous system are the fixed points of the discrete iteration. But the other way around - if we keep applying the fixed-point iteration, will it always converge to an equilibrium? This is equivalent to ask: if the continuous system is stable near an equilibrium, will discretization preserve this stability?

Here are some examples showing that fixed-point iteration does not always converge (Note that with strict Euler discretization, the fixed points of the discrete system coincide with the equilibria of the continuous system. Thus for fixed-point iteration with strict Euler discretization, it may diverge or oscillate, but will never converge to a point different from equilibria of the continuous system).

Counterexample 1: Oscillation without convergence (neither diverging nor converging)

Consider the continuous system $\dot{x} = -x$. The equilibrium is $x^\star=0$, and the continuous-time solution is $x(t)=x(0)e^{-t}\to 0$. The Euler discretization gives $$x_{k+1} = x_k - \alpha x_k = (1-\alpha)x_k$$ For $\alpha=2$, the Euler iteration becomes $x_{k+1} = -x_k$, so $x_k = (-1)^k x_0$, which oscillates between $+x_0$ and $-x_0$ forever and never converges.
Counterexample 2: Euler step size too large, leads to divergence

Again, for $\dot{x}=-x$. Take $\alpha = 3, x_0 = 1$. Then the iterates are: $$ \begin{align} x_1 &= (1-3)\cdot 1 = -2 \nonumber\newline x_2 &= (1-3)\cdot(-2) = 4 \nonumber\newline x_3 &= (1-3)\cdot 4 = -8 \nonumber\newline x_4 &= (1-3)\cdot(-8) = 16 \nonumber \end{align} $$ The trajectory alternates sign and doubles in magnitude at each step, moving farther and farther away from the equilibrium $x^\star=0$.

When $|1-\alpha|>1$ (equivalently $\alpha>2$), each Euler update amplifies the distance to the equilibrium, so the fixed-point iteration diverges. When $|1-\alpha|<1$ (i.e. $0<\alpha<2$), each Euler update contracts the distance to the equilibrium and the iteration converges to 0.

Based on the examples above, we can get a sense that the stepsize $\alpha$ plays an important role in discretization. Next, we'll see what kind of discretization will preserve stability. No surprise, it's mainly about choosing a proper $\alpha$.

Fixed-Point Iteration Convergence Condition (Stability Condition for Discretized Systems)¶

Our goal is to prove that there exists $\alpha>0$ such that if $x_0$ is sufficiently close to $x^\star$, the sequence generated by the Euler iteration $x_{k+1} = x_k + \alpha f(x_k)$ satisfies $x_k \to x^\star$. The method is analogous to analyzing stability for continuous systems: we show that the error variable converges to zero. As in the continuous case, define the error $$ y_k := x_k - x^\star $$ Substituting into the algorithm, we obtain a nonlinear discrete dynamical system: $$ \begin{align} y_{k+1} &= x_{k+1} - x^\star \nonumber\newline &= x_k + \alpha f(x_k) - x^\star \nonumber\newline &= (x^\star + y_k) + \alpha f(x^\star + y_k) - x^\star \nonumber\newline &= y_k + \alpha f(x^\star + y_k) \nonumber \end{align} $$ If we can show $y_k$ converges to 0, this is equivalent to $x_k \to x^\star$.

Following the continuous system approach, expand $f(x^\star + y_k)$ as a Taylor series at $x^\star$. This decomposes the term into a linear transformation of $y_k$ plus a higher-order term, allowing linearization and the use of linear algebra tools to prove convergence. $$ f(x^\star + y_k) = Df(x^\star) y_k + O(\|y_k\|^2) $$ Let $J := Df(x^\star)$ be the Jacobian matrix. Substituting back, we get $$ y_{k+1} = (I + \alpha J)y_k + O(\|y_k\|^2) $$ Near $y_k \to 0$, $O(\|y\|^2)$ is higher order than the linear term and can be neglected relative to $(I + \alpha J)y_k$. Thus, as $y_k \to 0$, the discrete dynamical system becomes a linear system: $$ y_{k+1} = (I + \alpha J)y_k $$

For any discrete linear system $y_{k+1} = Ay_k$, we have $$ y_k \to 0\ \text{(for any initial value)} \quad \Longleftrightarrow \quad \rho(A)<1 $$ where $\rho(\cdot) = \max_i |\lambda_i(A)|$ is the spectral radius, i.e., the largest modulus among the eigenvalues of $A$. This means that each eigen-direction of the system contracts, ensuring global convergence of the iteration.

Substituting $A = I + \alpha J$, we get that $y_k \rightarrow 0$, i.e., $x_k \to x^\star$, if and only if, for the Jacobian $J = Df(x^\star)$, $$ \rho(I + \alpha J) < 1 $$ i.e. $$ \boxed{ |1 + \alpha \lambda_i(J)| < 1 \quad \text{for all } i } $$

This yields a step-size restriction, with the existing condition $\alpha > 0$ in Euler discretization: $$ 0 < \alpha < \frac{2}{|\lambda_{\max}|} \quad \text{(in the real eigenvalue case)} $$

Proof: $y_k \to 0$ for any initial value if and only if $\rho(A)<1$.

Let $$ y_{k+1} = Ay_k \quad \Rightarrow \quad y_k = A^k y_0. $$

(⇒) If $\rho(A)<1$, then $A^k \to 0$.

Consider the Jordan decomposition $A=VJV^{-1}$, where $J$ is a direct sum of Jordan blocks with eigenvalues $\lambda$. Then: $$ A^k = V J^k V^{-1} $$ For a Jordan block $B = \lambda I + N$ ($N$ is a strictly upper-triangular nilpotent matrix, $N^m=0$): $$ B^k = (\lambda I + N)^k = \sum_{r=0}^{m-1} \binom{k}{r}\lambda^{k-r}N^r $$ So there exist constants $C, m$ such that: $$ |B^k| \leq C\,k^{m-1}|\lambda|^k $$ If $|\lambda|<1$, then $k^{m-1}|\lambda|^k \to 0$ (exponential decay dominates any polynomial growth).

If $\rho(A)<1$, all eigenvalues satisfy $|\lambda|<1$, so all Jordan blocks $B^k \to 0$, hence $$ |A^k| \to 0 \quad \Rightarrow \quad y_k = A^k y_0 \to 0 $$

(⇐) If $\rho(A)\ge 1$, it is impossible for $y_k$ to converge to 0 for all initial values.

If $\rho(A)\ge 1$, there exists an eigenvalue $\lambda$ with $|\lambda|\ge 1$ and a corresponding eigenvector $v\neq 0$ so that $$ Av = \lambda v $$ Taking $y_0 = v$, we have $$ y_k = A^k v = \lambda^k v $$ Thus $|y_k| = |\lambda|^k |v| \not\to 0$ (if $|\lambda|=1$ it does not decay; if $|\lambda|>1$ it diverges).

Therefore, $y_k\to 0$ for all initial values only if $\rho(A)<1$.

Example: Gradient Flow¶

Let $V : \mathbb{R}^n \to \mathbb{R}$ be a differentiable function. Consider the gradient flow: $$ \dot{x}(t) = -\nabla V(x(t)) $$

Equilibria satisfy $$ \nabla V(x^\star) = 0 $$ Along trajectories, $\frac{d}{dt} V(x(t)) = -|\nabla V(x(t))|^2 \le 0$. Based on the stability condition, the local minima of $V$ are asymptotically stable equilibria.

Applying Euler discretization gives $$ x_{k+1} = x_k - \alpha \nabla V(x_k) $$

This is gradient descent. At this point, discretization is no longer viewed as approximating $x(t)$. Instead: gradient descent is an algorithm designed to converge to stationary points of $V$.

How to choose a proper $\alpha$? We apply the stability condition $|1 + \alpha \lambda_i(J)| < 1$ here.

Let the iteration map be $$ F(x)=x-\alpha\nabla V(x) $$ whose Jacobian at $x^\star$ is $$ DF(x^\star)=I-\alpha\nabla^2 V(x^\star) $$ Local asymptotic stability of the fixed point requires $$ |1-\alpha\lambda_i(\nabla^2 V(x^\star))|<1, \quad \forall i $$ which is equivalent to $$ 0<\alpha\lambda_i(\nabla^2 V(x^\star))<2 $$ If $x^\star$ is a local minimizer and $V$ is $L$-smooth ($|\nabla V(x) - \nabla V(y)| \le L |x-y|$), then $$ 0\le \lambda_i(\nabla^2 V(x^\star))\le L $$ so a sufficient (uniform) condition is $$ 0<\alpha<\frac{2}{L} $$ Thus the standard step-size bound for gradient descent is precisely the linear stability condition specialized to $J=-\nabla^2V(x^\star)$, with $L$-smoothness providing a global upper bound on the spectrum.