Neural Networks Explained: From Neurons to Deep Learning
How do neural networks actually work? This article builds the intuition from first principles — no math degree required.
The Biological Inspiration (and Its Limits)
Neural networks are loosely inspired by the human brain. A biological neuron receives signals from many other neurons, integrates those signals, and fires if the total input crosses a threshold.
Artificial neurons work similarly — but calling them "brain-like" is like calling a paper airplane "airplane-like." The abstraction is useful. The comparison has limits.
What actually matters: artificial neural networks are universal function approximators. Given enough neurons and the right training, they can model almost any relationship between inputs and outputs.
What a Neuron Does
A single artificial neuron does three things:
- Takes multiple inputs (numbers)
- Multiplies each input by a weight (its importance)
- Adds a bias, then applies an activation function
def neuron(inputs, weights, bias):
weighted_sum = sum(x * w for x, w in zip(inputs, weights))
total = weighted_sum + bias
return activation(total)
The output is passed to the next layer. Weights and biases are what the network learns — everything else is fixed architecture.
Activation Functions
Without activation functions, stacking layers of neurons is pointless — linear operations compose into another linear operation, no matter how many layers you add.
Activation functions introduce non-linearity, which is what allows networks to learn complex patterns.
ReLU (Rectified Linear Unit)
The most commonly used activation in hidden layers:
def relu(x):
return max(0, x)
It's simple: negative inputs become zero, positive inputs pass through unchanged. This sparsity has regularizing effects and makes training more stable.
Sigmoid
Squashes output to (0, 1). Used in binary classification output layers:
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
Softmax
Converts a vector of numbers into a probability distribution (all values sum to 1). Used for multi-class classification outputs.
Building a Network
Networks are organized in layers:
- Input layer — receives the raw data
- Hidden layers — where learning happens
- Output layer — produces the prediction
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
[pixel 1] → [neuron 1] → [neuron 1] → [cat: 0.92]
[pixel 2] → [neuron 2] → [neuron 2] → [dog: 0.07]
[pixel 3] → [neuron 3] → [neuron 3] → [bird: 0.01]
...
"Deep" learning simply means a network with many hidden layers. Depth allows the network to learn increasingly abstract representations:
- Layer 1: edges and lines
- Layer 2: shapes and textures
- Layer 3: object parts
- Layer 4: whole objects
How Networks Learn: Backpropagation
Training is where the real magic happens.
Forward pass: data flows through the network, producing a prediction.
Loss calculation: we measure how wrong the prediction was using a loss function (e.g., cross-entropy for classification, mean squared error for regression).
Backward pass (backpropagation): the error is propagated backward through the network. Each weight gets a gradient — a measure of how much changing that weight would reduce the error.
Gradient descent: weights are updated in the direction that reduces loss:
weight = weight - learning_rate * gradient
Repeat this for millions of examples, across many epochs, and the network converges to weights that make good predictions.
Building a Neural Network with PyTorch
import torch
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(784, 256), # input: 28x28 image flattened
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 10), # output: 10 classes (digits 0-9)
)
def forward(self, x):
return self.layers(x)
model = SimpleNet()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Training loop
for epoch in range(10):
for images, labels in train_dataloader:
images = images.view(-1, 784) # flatten
predictions = model(images)
loss = loss_fn(predictions, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}: Loss = {loss.item():.4f}")
This network would learn to recognize handwritten digits (MNIST) and reach ~98% accuracy in minutes.
Common Architectures
Feedforward Networks (MLP) — what we've built above. Good for tabular data and simple classification.
Convolutional Neural Networks (CNN) — designed for images. Use spatial filters that detect edges, shapes, and objects while ignoring position. Dominant in computer vision.
Recurrent Neural Networks (RNN) — process sequences by maintaining state across time steps. Historically used for text; mostly replaced by transformers.
Transformers — the current state of the art for most tasks. Attention mechanisms let every token relate to every other token in the input. Powers GPT, BERT, and essentially all modern LLMs.
What Makes Deep Learning Powerful
The answer is learned representations. Instead of engineering features by hand (which doesn't scale), deep networks learn which features matter directly from raw data.
This is why the same architecture can learn to recognize cats, detect tumors, predict protein structures, and translate languages — with different training data, the same fundamental approach works.
The practical implication: the bottleneck in modern AI is not algorithms, it's data quality and quantity.
Where to Go Next
From here, the deep learning curriculum branches:
- Computer Vision: CNNs, object detection, image segmentation
- NLP: Transformers, BERT, fine-tuning language models
- Generative Models: GANs, VAEs, diffusion models
Start with the PyTorch official tutorials. Implement models from scratch before using high-level APIs — that's where the real understanding is built.