on
Automatic Differentiation Using Dual Numbers: A Clear and Intuitive Understanding
In modern machine learning and scientific computing, one requirement comes up again and again:
computing derivatives accurately, efficiently, and automatically.
Whether we are training neural networks, optimizing engineering systems, or solving equations, derivatives are essential.
One of the most powerful tools developed for this purpose is Automatic Differentiation (AutoDiff).
Unlike symbolic differentiation (which manipulates formulas) or numerical differentiation (which estimates derivatives using finite differences), automatic differentiation gives us exact derivatives programmatically.
At the heart of one very elegant form of automatic differentiation lies an unexpected but beautiful idea: dual numbers.
In this post, we will build a deep and clear understanding:
- Why automatic differentiation is needed
- What dual numbers are
- How dual numbers allow us to compute derivatives automatically
- Why this method is both exact and efficient
Why Do We Need Automatic Differentiation?
Suppose you have a function:
\[f(x) = \sin(x^2) + \ln(x)\]and you want to compute its derivative $ f’(x) $ at a particular value of $x$.
You have three traditional options:
- Symbolic differentiation: manipulate the formula by hand (or using software like SymPy).
-
Numerical differentiation: use finite differences:
\[f'(x) \approx \frac{f(x+h) - f(x)}{h}\]for small $ h $.
- Manual coding: write the derivative formula yourself.
Each has its problems:
- Symbolic differentiation can be very slow and memory-intensive for complex functions.
- Numerical differentiation is prone to rounding errors and instability.
- Manual coding is tedious and error-prone.
Automatic Differentiation offers a solution:
It programmatically computes exact derivatives, efficiently, at machine precision — without needing to symbolically manipulate formulas.
What Are Dual Numbers?
To understand how automatic differentiation works at a deep level, we need to understand dual numbers.
A dual number is an extension of the real numbers, just like complex numbers are.
A dual number looks like:
\[x + \epsilon x'\]where:
- $ x $ is the regular (real) part,
- $ x’ $ is the infinitesimal part (called the “dual part”),
- $ \epsilon $ is a special quantity satisfying:
In other words:
- $ \epsilon $ is not zero, but its square is zero.
This strange behavior is what makes dual numbers powerful for automatic differentiation.
How Dual Numbers Capture Derivatives
The key insight is this:
If you evaluate a function $ f $ on a dual number $ x + \epsilon $,
then the result will be:
That is:
- The real part gives $ f(x) $,
- The dual part gives $ f’(x) $, the derivative.
This can be seen by Taylor expansion:
\[f(x+\epsilon) = f(x) + f'(x)\epsilon + \frac{1}{2}f''(x)\epsilon^2 + \cdots\]but since $ \epsilon^2 = 0 $, all higher-order terms vanish, and we are left with:
\[f(x+\epsilon) = f(x) + f'(x)\epsilon\]Thus, by evaluating the function on a dual number, the derivative “appears automatically” in the coefficient of $\epsilon$.
A Symbolic Example
Suppose we want to differentiate:
\[f(x) = x^2\]We introduce a dual number:
\[z = x + \epsilon\]Substituting:
\[f(z) = (x+\epsilon)^2 = x^2 + 2x\epsilon + \epsilon^2\]Since $ \epsilon^2 = 0 $, this simplifies to:
\[f(z) = x^2 + 2x\epsilon\]Thus:
- The real part is $x^2$,
- The dual part is $2x$.
The dual part coefficient $2x$ is exactly the derivative $f’(x)$.
No finite differences.
No symbolic manipulation.
Just simple, mechanical computation.
How Automatic Differentiation Works Internally
When you implement automatic differentiation using dual numbers:
- Each variable carries both its value and its derivative.
- Each operation (addition, multiplication, sin, cos, etc.) is overloaded to propagate both the value and the derivative correctly.
For example:
| Operation | Real Part (Value) | Dual Part (Derivative) |
|---|---|---|
| $ a + b $ | $ a + b $ | $ a’ + b’ $ |
| $ a \times b $ | $ ab $ | $ a’ b + a b’ $ |
| $ \sin(a) $ | $ \sin(a) $ | $ \cos(a) \times a’ $ |
| $ \exp(a) $ | $ \exp(a) $ | $ \exp(a) \times a’ $ |
Each rule is a direct application of the chain rule of calculus — implemented mechanically at each operation.
As a result:
- Complex functions can be differentiated automatically.
- Derivatives are exact up to machine precision.
- Efficiency is very high, with no need for tiny perturbations or symbolic trees.
A Simple Schematic: How AutoDiff Propagates Derivatives
When we apply automatic differentiation, every variable carries not just its value, but also its derivative.
At each function composition step, the chain rule is applied automatically.
For a composite function $ y = f(g(x)) $, AutoDiff proceeds like this:
(x, dx/dx = 1)
↓
[ g(x) ]
↓
[ f(g(x)) ]
↓
(y, dy/dx)
- At $x$, the derivative $dx/dx = 1$.
- After $g$, the value becomes $g(x)$ and the derivative becomes $dg/dx$.
- After $f$, the value becomes $f(g(x))$ and the derivative becomes:
Each stage mechanically applies the chain rule — no symbolic algebra, no approximations.
Thus, AutoDiff ensures that complex nested functions can be differentiated exactly, step-by-step, through simple local operations.
Why This Method is Beautiful
By extending numbers to dual numbers,
we unify function evaluation and derivative computation into a single process.
Instead of treating differentiation as an afterthought,
it becomes a natural part of computation.
This perspective is not only elegant,
it is one of the foundations of modern deep learning frameworks like TensorFlow, PyTorch (internally, they use reverse-mode differentiation, but the conceptual spirit remains connected).
Dual numbers are one of the purest ways to understand how machine learning software “magically” computes gradients behind the scenes.
Conclusion
Automatic differentiation using dual numbers reveals a deep and beautiful structure behind differentiation.
By simply extending real numbers slightly — into a realm where $\epsilon^2 = 0$ —
we are able to compute exact derivatives as a natural by-product of evaluating functions.
This is not a numerical trick.
It is a mathematical expansion of how we view numbers and functions.
Understanding this enriches not just how we see differentiation,
but also how we understand modern computational frameworks that power machine learning, optimization, and scientific computing today.
In the next post, we will explore how different modes of automatic differentiation (forward mode, reverse mode) connect back to these foundational ideas.
Note:
This post was developed with structured human domain expertise, supported by AI for clarity and organization.
Despite careful construction, subtle errors or gaps may exist. If you spot anything unclear or have suggestions, please reach out via our members-only chat — your feedback helps us make this resource even better for everyone!