Neural Networks

Overview

A neural network is a stack of layers that transform input data through learned weights and nonlinear activation functions. Each neuron computes a weighted sum of inputs plus a bias, then passes the result through an activation function. Stacking many layers enables deep learning—learning hierarchical features (edges → shapes → objects in vision; characters → words → meaning in text).

The forward pass runs input through the network to produce an output. Backpropagation (during training) computes gradients and updates weights to reduce loss.

Syntax / Usage

Core components:

Input layer    → raw features (pixels, token embeddings)
Hidden layers  → learned transformations
Output layer   → predictions (class logits, regression value, etc.)

Weight (W)     → strength of connection between neurons
Bias (b)       → offset added before activation
Activation (σ) → nonlinearity (ReLU, sigmoid, softmax)

Popular activation functions:

Function	Use case
ReLU	Hidden layers; fast, avoids vanishing gradients
Sigmoid	Binary output probabilities (0–1)
Softmax	Multi-class probabilities (sum to 1)
GELU / Swish	Common in modern transformers (LLMs)

Forward pass (simplified single neuron):

import math

def relu(x: float) -> float:
    return max(0.0, x)

def forward_neuron(inputs: list[float], weights: list[float], bias: float) -> float:
    z = sum(i * w for i, w in zip(inputs, weights)) + bias
    return relu(z)

Deep networks repeat: output = activation(W @ input + b) per layer.

Examples

Conceptual 3-layer classifier for tabular data:

Input (10 features) → Dense(64, ReLU) → Dense(32, ReLU) → Dense(3, Softmax)

Transformers (used in LLMs) replace dense stacks with attention layers—see large-language-models.

Common Mistakes

Assuming more layers always help—small datasets overfit quickly
Using softmax on hidden layers (meant for final multi-class outputs)
Ignoring input normalization—unscaled features slow or destabilize training
Confusing parameters (millions/billions) with understanding
Trying to hand-tune weights instead of using frameworks and pretrained checkpoints

Overview

Syntax / Usage

Examples

Common Mistakes

See Also