Neural Networks Explained: From Neurons to Deep Learning

The Biological Inspiration (and Its Limits)

Neural networks are loosely inspired by the human brain. A biological neuron receives signals from many other neurons, integrates those signals, and fires if the total input crosses a threshold.

Artificial neurons work similarly — but calling them "brain-like" is like calling a paper airplane "airplane-like." The abstraction is useful. The comparison has limits.

What actually matters: artificial neural networks are universal function approximators. Given enough neurons and the right training, they can model almost any relationship between inputs and outputs.

What a Neuron Does

A single artificial neuron does three things:

Takes multiple inputs (numbers)
Multiplies each input by a weight (its importance)
Adds a bias, then applies an activation function

def neuron(inputs, weights, bias):
    weighted_sum = sum(x * w for x, w in zip(inputs, weights))
    total = weighted_sum + bias
    return activation(total)

The output is passed to the next layer. Weights and biases are what the network learns — everything else is fixed architecture.

Activation Functions

Without activation functions, stacking layers of neurons is pointless — linear operations compose into another linear operation, no matter how many layers you add.

Activation functions introduce non-linearity, which is what allows networks to learn complex patterns.

ReLU (Rectified Linear Unit)

The most commonly used activation in hidden layers:

def relu(x):
    return max(0, x)

It's simple: negative inputs become zero, positive inputs pass through unchanged. This sparsity has regularizing effects and makes training more stable.

Sigmoid

Squashes output to (0, 1). Used in binary classification output layers:

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

Softmax

Converts a vector of numbers into a probability distribution (all values sum to 1). Used for multi-class classification outputs.

Building a Network

Networks are organized in layers:

Input layer — receives the raw data
Hidden layers — where learning happens
Output layer — produces the prediction

Input Layer    Hidden Layer 1    Hidden Layer 2    Output Layer
[pixel 1]  →   [neuron 1]   →    [neuron 1]   →   [cat: 0.92]
[pixel 2]  →   [neuron 2]   →    [neuron 2]   →   [dog: 0.07]
[pixel 3]  →   [neuron 3]   →    [neuron 3]   →   [bird: 0.01]
...

"Deep" learning simply means a network with many hidden layers. Depth allows the network to learn increasingly abstract representations:

Layer 1: edges and lines
Layer 2: shapes and textures
Layer 3: object parts
Layer 4: whole objects

How Networks Learn: Backpropagation

Training is where the real magic happens.

Forward pass: data flows through the network, producing a prediction.

Loss calculation: we measure how wrong the prediction was using a loss function (e.g., cross-entropy for classification, mean squared error for regression).

Backward pass (backpropagation): the error is propagated backward through the network. Each weight gets a gradient — a measure of how much changing that weight would reduce the error.

Gradient descent: weights are updated in the direction that reduces loss:

weight = weight - learning_rate * gradient

Repeat this for millions of examples, across many epochs, and the network converges to weights that make good predictions.

Building a Neural Network with PyTorch

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(784, 256),   # input: 28x28 image flattened
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 10),    # output: 10 classes (digits 0-9)
        )

    def forward(self, x):
        return self.layers(x)

model = SimpleNet()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
    for images, labels in train_dataloader:
        images = images.view(-1, 784)  # flatten
        
        predictions = model(images)
        loss = loss_fn(predictions, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    print(f"Epoch {epoch+1}: Loss = {loss.item():.4f}")

This network would learn to recognize handwritten digits (MNIST) and reach ~98% accuracy in minutes.

Common Architectures

Feedforward Networks (MLP) — what we've built above. Good for tabular data and simple classification.

Convolutional Neural Networks (CNN) — designed for images. Use spatial filters that detect edges, shapes, and objects while ignoring position. Dominant in computer vision.

Recurrent Neural Networks (RNN) — process sequences by maintaining state across time steps. Historically used for text; mostly replaced by transformers.

Transformers — the current state of the art for most tasks. Attention mechanisms let every token relate to every other token in the input. Powers GPT, BERT, and essentially all modern LLMs.

What Makes Deep Learning Powerful

The answer is learned representations. Instead of engineering features by hand (which doesn't scale), deep networks learn which features matter directly from raw data.

This is why the same architecture can learn to recognize cats, detect tumors, predict protein structures, and translate languages — with different training data, the same fundamental approach works.

The practical implication: the bottleneck in modern AI is not algorithms, it's data quality and quantity.

Where to Go Next

From here, the deep learning curriculum branches:

Computer Vision: CNNs, object detection, image segmentation
NLP: Transformers, BERT, fine-tuning language models
Generative Models: GANs, VAEs, diffusion models

Start with the PyTorch official tutorials. Implement models from scratch before using high-level APIs — that's where the real understanding is built.