on
Neural Networks as Functions: Building the Right Intuition
When we first encounter neural networks, they often seem mysterious: a web of interconnected nodes, weights, and activations. But if we strip away all the complexity, at the heart of it, a neural network is just a function — a very flexible and powerful one.
This perspective is not just a mathematical trick. It is the key to truly understanding how neural networks operate, why they are trained the way they are, and how they manage to approximate complex relationships between inputs and outputs.
In this post, we will build up this idea carefully:
- What does it mean to call a neural network a function?
- How do layers and activations work together to construct this function?
- What important mathematical properties do these functions have?
- Why does this viewpoint matter when thinking about training, generalization, and model design?
Neural Networks are Functions at Their Core
In mathematics, a function simply maps inputs to outputs:
\[f: X \rightarrow Y\]It takes an input $ x \in X $ and produces an output $ y \in Y $.
In exactly the same way, a neural network defines a mapping:
\[\mathcal{N}: \mathbb{R}^n \rightarrow \mathbb{R}^m\]It takes an input vector of size $ n $ and produces an output vector of size $ m $.
The only difference is that this mapping is parameterized by weights and biases, which we learn during training.
Thus, the network function can be written as:
where $ \theta $ represents all the adjustable parameters.
How Neural Networks Construct Functions
A typical feedforward neural network builds complexity through layers, each performing two basic steps:
-
A linear transformation: \(z = Wx + b\) where $ W $ is a matrix of weights and $ b $ is a bias vector.
-
A nonlinear activation: \(a = \sigma(z)\) where $ \sigma $ is an activation function such as ReLU, Sigmoid, or Tanh.
By stacking multiple layers, the network creates a composition of functions:
\[\mathcal{N}(x) = \sigma_L(W_L \, \sigma_{L-1}(W_{L-1} \, \sigma_{L-2}( \cdots W_1 x + b_1 ) + b_2 ) \cdots + b_L )\]Each layer reshapes and bends the input space a little more.
Together, they allow the network to model highly nonlinear relationships.
Without nonlinear activations, no matter how many layers we stack, the network would still represent only a linear function.
Nonlinearity is therefore essential for the expressiveness of neural networks.
Important Mathematical Properties: Continuity and Differentiability
When we think of a neural network as a function, certain mathematical properties become important.
Since the building blocks — linear transformations and typical activations — are continuous, the network as a whole is continuous.
This continuity ensures that small changes in input lead to small changes in output, which is critical for stable learning and prediction.
Moreover, training neural networks using gradient-based methods requires differentiability.
Linear maps are obviously differentiable, and most common activation functions (like Sigmoid and Tanh) are differentiable everywhere.
Even functions like ReLU, which are non-differentiable at a single point (zero), are handled gracefully using subgradients.
Thus, neural networks are almost everywhere differentiable, making them highly suitable for optimization using methods like backpropagation.
Seeing the Function in Action: Small Examples
When you train a neural network to predict house prices:
- Input: Area, number of bedrooms, location
- Output: Predicted price
the network defines a function:
\[\mathcal{N}: \mathbb{R}^3 \rightarrow \mathbb{R}\]mapping features directly to outcomes.
When you train a network to classify handwritten digits:
- Input: Pixel values from an image
- Output: Probability scores for each digit class
the network implements a function:
\[\mathcal{N}: \mathbb{R}^{784} \rightarrow \mathbb{R}^{10}\]No matter the size or complexity, a neural network remains, fundamentally, a function mapping inputs to outputs through layered transformations.
A Symbolic Example: 2-2-2-1 Network
To understand this more clearly, consider a very small symbolic network:
- Two input variables: $ (x_1, x_2) $
- Two neurons in the first hidden layer
- Two neurons in the second hidden layer
- One neuron in the output layer
The first hidden layer computes two intermediate features:
\[h_1 = \sigma(w_{11} x_1 + w_{12} x_2 + b_1)\] \[h_2 = \sigma(w_{21} x_1 + w_{22} x_2 + b_2)\]These features are then passed to the second hidden layer, producing:
\[g_1 = \sigma(v_{11} h_1 + v_{12} h_2 + c_1)\] \[g_2 = \sigma(v_{21} h_1 + v_{22} h_2 + c_2)\]Finally, the output neuron produces the scalar output:
\[y = u_1 g_1 + u_2 g_2 + d\]where:
- $w_{ij}, v_{ij}, u_i$ are weights,
- $b_i, c_i, d$ are biases,
- $\sigma$ is a nonlinear activation (such as ReLU or Tanh).
Putting it all together, the overall expanded symbolic expression for $ y $ is:
\[\begin{aligned} y = \, & u_1 \, \sigma\Big( v_{11} \, \sigma(w_{11} x_1 + w_{12} x_2 + b_1) \\ & + v_{12} \, \sigma(w_{21} x_1 + w_{22} x_2 + b_2) + c_1 \Big) \\ & + u_2 \, \sigma\Big( v_{21} \, \sigma(w_{11} x_1 + w_{12} x_2 + b_1) \\ & + v_{22} \, \sigma(w_{21} x_1 + w_{22} x_2 + b_2) + c_2 \Big) \\ & + d \end{aligned}\]This shows explicitly that $y$ is a function of the inputs $x_1, x_2$, built by composing simple linear and nonlinear operations.
Every part of this network — linear combinations, nonlinear activations, summations — contributes to shaping the final mapping from input to output.
Even a tiny network like this behaves as a layered functional machine, extracting features, recombining them, and finally producing a prediction.
Why the Functional View is So Powerful
Thinking of neural networks as functions simplifies many deeper ideas.
- Training becomes function optimization: finding parameters $ \theta $ such that $\mathcal{N}(x; \theta)$ approximates the desired behavior.
- Overfitting becomes learning a function that captures noise instead of general patterns.
- Generalization becomes studying how the learned function behaves on unseen inputs.
- Architecture design becomes choosing how to structure function transformations: convolutional, recurrent, transformer-based, and so on.
The functional viewpoint also makes it natural to understand why deeper networks are more powerful:
Every additional layer allows composing more nonlinear transformations, which increases the expressive richness of the overall function.
Finally, the famous Universal Approximation Theorem, which states that neural networks can approximate any continuous function under mild conditions, becomes less surprising.
Neural networks are simply very flexible, learnable function constructors.
Conclusion
At the deepest level, a neural network is not a black box.
It is a continuous, almost everywhere differentiable function, built systematically from basic ingredients: linear maps, nonlinear activations, and parameterized combinations.
Understanding this transforms how we think about neural networks:
- Not as mysterious devices,
- But as logical, controllable mathematical objects.
Whenever confusion arises — about architectures, training dynamics, model behavior — it is helpful to return to this foundation:
A neural network is nothing but a function — a learnable, highly expressive function.
Everything else flows naturally from this.
Note:
This post was developed with structured human domain expertise, supported by AI for clarity and organization.
Despite careful construction, subtle errors or gaps may exist. If you spot anything unclear or have suggestions, please reach out via our members-only chat — your feedback helps us make this resource even better for everyone!