Neural Networks
Layers, weights, forward passes, and activation functions explained
Overview
A neural network is a stack of layers that transform input data through learned weights and nonlinear activation functions. Each neuron computes a weighted sum of inputs plus a bias, then passes the result through an activation function. Stacking many layers enables deep learning—learning hierarchical features (edges → shapes → objects in vision; characters → words → meaning in text).
The forward pass runs input through the network to produce an output. Backpropagation (during training) computes gradients and updates weights to reduce loss.
Syntax / Usage
Core components:
Input layer → raw features (pixels, token embeddings)
Hidden layers → learned transformations
Output layer → predictions (class logits, regression value, etc.)
Weight (W) → strength of connection between neurons
Bias (b) → offset added before activation
Activation (σ) → nonlinearity (ReLU, sigmoid, softmax)
Popular activation functions:
| Function | Use case |
|---|---|
| ReLU | Hidden layers; fast, avoids vanishing gradients |
| Sigmoid | Binary output probabilities (0–1) |
| Softmax | Multi-class probabilities (sum to 1) |
| GELU / Swish | Common in modern transformers (LLMs) |
Forward pass (simplified single neuron):
import math
def relu(x: float) -> float:
return max(0.0, x)
def forward_neuron(inputs: list[float], weights: list[float], bias: float) -> float:
z = sum(i * w for i, w in zip(inputs, weights)) + bias
return relu(z)
Deep networks repeat: output = activation(W @ input + b) per layer.
Examples
Conceptual 3-layer classifier for tabular data:
Input (10 features) → Dense(64, ReLU) → Dense(32, ReLU) → Dense(3, Softmax)
Transformers (used in LLMs) replace dense stacks with attention layers—see large-language-models.
Common Mistakes
- Assuming more layers always help—small datasets overfit quickly
- Using softmax on hidden layers (meant for final multi-class outputs)
- Ignoring input normalization—unscaled features slow or destabilize training
- Confusing parameters (millions/billions) with understanding
- Trying to hand-tune weights instead of using frameworks and pretrained checkpoints
See Also
machine-learning-basics large-language-models embeddings