Tinker With a Neural Network Right Here in Your Browser.
Don’t Worry, You Can’t Break It. We Promise.

Epoch

Data

Which dataset do you want to use?

Features

Which properties do you want to feed in?

Click anywhere to edit.
Weight/Bias is 0.2.
This is the output from one neuron. Hover to see it larger.
The outputs are mixed with varying weights, shown by the thickness of the lines.

Output

Test loss
Training loss
Sweep K Depth Width Train loss Test loss
true walk line prediction MSE low to high
Colors shows data, neuron and weight values.

3D Projection

How Does N-Dimensional Random-Walk Regression Work?

The random-walk mode trains a neural network to learn a vector-valued function fθ: ℝN → ℝN. Inputs come from sparse, randomly selected cells in an N-dimensional grid, and each selected cell is assigned one position from an N-dimensional random walk.

1. Input regions

The input domain [-1, 1]N is divided into K bins per dimension, producing KN possible hypercubic regions. The UI limits K so that KN ≤ 10,000. Only K regions are randomly selected.

For region index r, dimension j has grid coordinate

cj = floor(r / Kj) mod K.

A coordinate is sampled uniformly inside that cell:

xj ~ U(-1 + 2cj/K, -1 + 2(cj + 1)/K).

Each selected region receives exactly M samples, so the complete dataset contains S = KM examples.

2. Target random walk

The output walk starts at w0 = (0, ..., 0). At each step, the generator chooses a dimension Jt and direction st ∈ {-1, +1}, then proposes

wt = wt-1 + st(2/K)eJt.

Moves outside [-1, 1]N are rejected. There are K walk steps. The selected input regions are randomly permuted and assigned the unique targets w1, ..., wK. Therefore, nearby input regions do not necessarily receive nearby walk targets: the task is a sparse, piecewise-constant mapping x ∈ Rr ⇒ y = wπ(r).

3. Optional Gaussian target noise

When Gaussian noise is enabled, every output coordinate becomes

j = yj + εj,   εj ~ N(μ, σ2).

The variance control is σ2; the implementation scales a standard-normal sample by the square root of that value.

4. Network and loss

Without an architecture frontend, the network shape is N → h1 → ... → hL → N. Hidden layers use the selected activation and the output layer is linear:

z(ℓ) = W(ℓ)a(ℓ-1) + b(ℓ),   a(ℓ) = φ(z(ℓ)).

The displayed train and test loss averages half-squared error over samples and output dimensions:

L = (1/SN) Σi=1S Σj=1N ½(ŷij - yij)2.

The output derivative used by backpropagation is ∂L/∂ŷij = ŷij - yij, before the code's batch averaging.

5. Optional fixed frontends

The CNN frontend maps the input to eight deterministic width-two filter features:

φf(x) = (1/P) Σp[bf + Σq ωfqxp+q].

The Transformer frontend embeds coordinates in 16 dimensions, applies four-head scaled dot-product attention, a residual ReLU transform, and mean pooling. Its attention weights have the usual form

αij = softmaxj(qiTkj / √4).

These frontend coefficients are deterministic sine-generated constants. They are feature transforms, not trainable CNN or Transformer parameters; only the following dense network learns.

6. 3D output visualization

For selected output dimensions a, b, and c, a vector v is projected using range R:

P(v) = (clip(va/R), clip(vb/R), clip(vc/R)).

The black line connects the true walk P(w0), ..., P(wK). The prediction line evaluates the network at every selected region center, ŷr = fθ(center(Rr)), and connects those predictions in walk order. Prediction color uses full mean squared error:

Er = (1/N) Σj=1Nrj - yrj)2.

This color error is twice the half-squared loss displayed in the train and test metrics. Hovering a network node instead colors the true walk by that node's current activation at every region center.

7. OOD MSE plane

For every point q on the selected output-space slice, the plane displays

Eplane(q) = minr (1/N)||ŷr - q||22.

This measures distance from a hypothetical output point to the nearest predicted region-center output. It does not evaluate the network on out-of-distribution inputs and is not a direct ground-truth OOD error.

8. Automated architecture sweeps

The depth sweep tests 2, 4, 6, and 8 hidden layers at width 16. The width sweep tests 4, 8, 16, and 32 neurons at depth 4. Both use K ∈ {2, 3, 4, 5} and train for the configured number of epochs. Data is shuffled into an 80/20 sample split, so samples from the same selected region can occur in both train and test sets.

Um, What Is a Neural Network?

It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start. For a more technical overview, try Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

This Is Cool, Can I Repurpose It?

Please do! We’ve open sourced it on GitHub with the hope that it can make neural networks a little more accessible and easier to learn. You’re free to use it in any way that follows our Apache License. And if you have any suggestions for additions or changes, please let us know.

We’ve also provided some controls below to enable you tailor the playground to a specific topic or lesson. Just choose which features you’d like to be visible below then save this link, or refresh the page.

What Do All the Colors Mean?

Orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

The data points (represented by small circles) are initially colored orange or blue, which correspond to positive one and negative one.

In the hidden layers, the lines are colored by the weights of the connections between neurons. Blue shows a positive weight, which means the network is using that output of the neuron as given. An orange line shows that the network is assiging a negative weight.

In the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

What Library Are You Using?

We wrote a tiny neural network library that meets the demands of this educational visualization. For real-world applications, consider the TensorFlow library.

Credits

This was created by Daniel Smilkov and Shan Carter. This is a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.