A Neural Network Playground

How Does N-Dimensional Random-Walk Regression Work?

The random-walk mode trains a neural network to learn a vector-valued function f_θ: ℝ^N → ℝ^N. Inputs come from sparse, randomly selected cells in an N-dimensional grid, and each selected cell is assigned one position from an N-dimensional random walk.

1. Input regions

The input domain [-1, 1]^N is divided into K bins per dimension, producing K^N possible hypercubic regions. The UI limits K so that K^N ≤ 10,000. Only K regions are randomly selected.

For region index r, dimension j has grid coordinate

c_j = floor(r / K^j) mod K.

A coordinate is sampled uniformly inside that cell:

x_j ~ U(-1 + 2c_j/K, -1 + 2(c_j + 1)/K).

Each selected region receives exactly M samples, so the complete dataset contains S = KM examples.

2. Target random walk

The output walk starts at w₀ = (0, ..., 0). At each step, the generator chooses a dimension J_t and direction s_t ∈ {-1, +1}, then proposes

w_t = w_t-1 + s_t(2/K)e_{J_t}.

Moves outside [-1, 1]^N are rejected. There are K walk steps. The selected input regions are randomly permuted and assigned the unique targets w₁, ..., w_K. Therefore, nearby input regions do not necessarily receive nearby walk targets: the task is a sparse, piecewise-constant mapping x ∈ R_r ⇒ y = w_π(r).

3. Optional Gaussian target noise

When Gaussian noise is enabled, every output coordinate becomes

ỹ_j = y_j + ε_j, ε_j ~ N(μ, σ²).

The variance control is σ²; the implementation scales a standard-normal sample by the square root of that value.

4. Network and loss

Without an architecture frontend, the network shape is N → h₁ → ... → h_L → N. Hidden layers use the selected activation and the output layer is linear:

z^(ℓ) = W^(ℓ)a^(ℓ-1) + b^(ℓ), a^(ℓ) = φ(z^(ℓ)).

The displayed train and test loss averages half-squared error over samples and output dimensions:

L = (1/SN) Σ_i=1^S Σ_j=1^N ½(ŷ_ij - y_ij)².

The output derivative used by backpropagation is ∂L/∂ŷ_ij = ŷ_ij - y_ij, before the code's batch averaging.

5. Optional fixed frontends

The CNN frontend maps the input to eight deterministic width-two filter features:

φ_f(x) = (1/P) Σ_p[b_f + Σ_q ω_fqx_p+q].

The Transformer frontend embeds coordinates in 16 dimensions, applies four-head scaled dot-product attention, a residual ReLU transform, and mean pooling. Its attention weights have the usual form

α_ij = softmax_j(q_i^Tk_j / √4).

These frontend coefficients are deterministic sine-generated constants. They are feature transforms, not trainable CNN or Transformer parameters; only the following dense network learns.

6. 3D output visualization

For selected output dimensions a, b, and c, a vector v is projected using range R:

P(v) = (clip(v_a/R), clip(v_b/R), clip(v_c/R)).

The black line connects the true walk P(w₀), ..., P(w_K). The prediction line evaluates the network at every selected region center, ŷ_r = f_θ(center(R_r)), and connects those predictions in walk order. Prediction color uses full mean squared error:

E_r = (1/N) Σ_j=1^N(ŷ_rj - y_rj)².

This color error is twice the half-squared loss displayed in the train and test metrics. Hovering a network node instead colors the true walk by that node's current activation at every region center.

7. OOD MSE plane

For every point q on the selected output-space slice, the plane displays

E_plane(q) = min_r (1/N)||ŷ_r - q||₂².

This measures distance from a hypothetical output point to the nearest predicted region-center output. It does not evaluate the network on out-of-distribution inputs and is not a direct ground-truth OOD error.

8. Automated architecture sweeps

The depth sweep tests 2, 4, 6, and 8 hidden layers at width 16. The width sweep tests 4, 8, 16, and 32 neurons at depth 4. Both use K ∈ {2, 3, 4, 5} and train for the configured number of epochs. Data is shuffled into an 80/20 sample split, so samples from the same selected region can occur in both train and test sets.

Um, What Is a Neural Network?

It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start. For a more technical overview, try Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

This Is Cool, Can I Repurpose It?

Please do! We’ve open sourced it on GitHub with the hope that it can make neural networks a little more accessible and easier to learn. You’re free to use it in any way that follows our Apache License. And if you have any suggestions for additions or changes, please let us know.

We’ve also provided some controls below to enable you tailor the playground to a specific topic or lesson. Just choose which features you’d like to be visible below then save this link, or refresh the page.

What Do All the Colors Mean?

Orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

The data points (represented by small circles) are initially colored orange or blue, which correspond to positive one and negative one.

In the hidden layers, the lines are colored by the weights of the connections between neurons. Blue shows a positive weight, which means the network is using that output of the neuron as given. An orange line shows that the network is assiging a negative weight.

In the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

What Library Are You Using?

We wrote a tiny neural network library that meets the demands of this educational visualization. For real-world applications, consider the TensorFlow library.

Credits

This was created by Daniel Smilkov and Shan Carter. This is a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.

Tinker With a Neural Network Right Here in Your Browser.
Don’t Worry, You Can’t Break It. We Promise.

Data

Features

Output

3D Projection

How Does N-Dimensional Random-Walk Regression Work?

1. Input regions

2. Target random walk

3. Optional Gaussian target noise

4. Network and loss

5. Optional fixed frontends

6. 3D output visualization

7. OOD MSE plane

8. Automated architecture sweeps

Um, What Is a Neural Network?

This Is Cool, Can I Repurpose It?

What Do All the Colors Mean?

What Library Are You Using?

Credits

Data

Features

add remove

Output

3D Projection

How Does N-Dimensional Random-Walk Regression Work?

1. Input regions

2. Target random walk

3. Optional Gaussian target noise

4. Network and loss

5. Optional fixed frontends

6. 3D output visualization

7. OOD MSE plane

8. Automated architecture sweeps

Um, What Is a Neural Network?

This Is Cool, Can I Repurpose It?

What Do All the Colors Mean?

What Library Are You Using?

Credits