stackademic

The leading education platform for anyone with an interest in software development.

Neural Networks

Layers, weights, forward passes, and activation functions explained

Overview

A neural network is a stack of layers that transform input data through learned weights and nonlinear activation functions. Each neuron computes a weighted sum of inputs plus a bias, then passes the result through an activation function. Stacking many layers enables deep learning—learning hierarchical features (edges → shapes → objects in vision; characters → words → meaning in text).

The forward pass runs input through the network to produce an output. Backpropagation (during training) computes gradients and updates weights to reduce loss.

Syntax / Usage

Core components:

Input layer    → raw features (pixels, token embeddings)
Hidden layers  → learned transformations
Output layer   → predictions (class logits, regression value, etc.)

Weight (W)     → strength of connection between neurons
Bias (b)       → offset added before activation
Activation (σ) → nonlinearity (ReLU, sigmoid, softmax)

Popular activation functions:

FunctionUse case
ReLUHidden layers; fast, avoids vanishing gradients
SigmoidBinary output probabilities (0–1)
SoftmaxMulti-class probabilities (sum to 1)
GELU / SwishCommon in modern transformers (LLMs)

Forward pass (simplified single neuron):

import math

def relu(x: float) -> float:
    return max(0.0, x)

def forward_neuron(inputs: list[float], weights: list[float], bias: float) -> float:
    z = sum(i * w for i, w in zip(inputs, weights)) + bias
    return relu(z)

Deep networks repeat: output = activation(W @ input + b) per layer.

Examples

Conceptual 3-layer classifier for tabular data:

Input (10 features) → Dense(64, ReLU) → Dense(32, ReLU) → Dense(3, Softmax)

Transformers (used in LLMs) replace dense stacks with attention layers—see large-language-models.

Common Mistakes

  • Assuming more layers always help—small datasets overfit quickly
  • Using softmax on hidden layers (meant for final multi-class outputs)
  • Ignoring input normalization—unscaled features slow or destabilize training
  • Confusing parameters (millions/billions) with understanding
  • Trying to hand-tune weights instead of using frameworks and pretrained checkpoints

See Also

machine-learning-basics large-language-models embeddings