Langotime Aionoscope · Representation Geometry

The shape of a signal

We all love colourful pictures, but interpretability isn't just that — it's an engineering tool. The better you understand how a model represents the world, the better you can fix it. Neural networks build their representations as geometry, and time-series foundation models are a particularly easy place to watch it happen. Sweep the phase of a sine wave or slide a spike across time, and you can read straight off the frozen activations whether the model traced the right shape — and where it fell short. Spot exactly where it breaks, and you can design a pointed fix — and make sure nothing else breaks in the process.

What does that look like? When a model truly grasps a factor, the shape of the data shows up as the shape of the activations: the phase of a sine closes into a loop, a spike's position stretches into a line, a trend's slope fans out along an axis. (That framing — world-structure as activation-geometry — comes from Goodfire's Neural Geometry series, made for language models; we highly recommend reading it.)

Time series are the natural place to look. Language interpretability has to hunt for structure in a discrete, tangled space of tokens and concepts; time series hand it to you. The generative factors are continuous — the phase and frequency of a sine, the position of a spike, the slope of a trend — and a perfectly controlled sweep is free to make: step one knob across a fine grid, switch everything else off, and watch, layer by layer, what the representation does. Ground truth isn't inferred from correlations; it's the dial you turned.

That turns geometry into a practical diagnostic — which is what the Langotime Aionoscope manifold suite measures and its dashboard lets you explore. Using Toto-2.0-2.5B (Datadog's 2.5-billion-parameter forecasting model) as the first running example, this post explains the project, how to read the charts, and where the geometry exposes a model's blind spots — like a factor Toto's tokenizer quietly discards.

Why we built itDecodable is not the same as organized

The standard way to interrogate a frozen representation is a linear probe: train a small linear readout and ask "can I recover the phase / the spike position / the slope from these activations?" It's a great question, and we still run it as our primary benchmark. But it only tells you whether a variable is linearly decodable. It is silent on a deeper question:

Does the representation arrange examples according to the true shape of the process that generated them?

Linear decodability can't settle that. A probe only needs one direction where the factor reads out — so it can score high while the manifold is curved, folded, or has its neighbours scrambled in every other direction, and it can score low on structure that's perfectly real but nonlinear, like a circle. Decodable and geometrically clean are different questions. We wanted to see the shape directly, so for each model we run a controlled experiment:

  • Sweep one factor. Pick a generative knob (say, sine_phase) and step it across a fine 1,024-point grid, holding everything else fixed. Every other component is switched off (num_enabled = 1), so the only thing changing is the factor we care about.
  • Collect frozen features. Push each signal through the model and read out the hidden activations at every layer. No fine-tuning, no probe training — just the representations the model already has.
  • Build the manifold. Each grid point gives one activation vector — its centroid
    We're using one sample per grid point here, so the "centroid" is actually just a single activation vector. We keep the name "centroid" to be in line with the general protocol, where several samples are collected at each grid point and averaged.
    . Ordered by the swept value, those centroids trace a path through representation space. That path is the object we score.
  • Measure the geometry. Does the path preserve distances? Local neighbourhoods? The cyclic topology of a phase, or the monotone order of a spike position? Does it close into a loop when it should?

We do this for 45 foundation models (Toto, Moirai, Chronos, TiRex, Mantis, TimesFM, Sundial, LeNEPA and more) across seven generative factors, at every layer.

How to read the dashboardThree views, one manifold

Pick a model, a target (the swept factor) and a layer, and the dashboard shows the same manifold from three angles. The rest of this post walks through each one with a real example.

  • Centroid path — the manifold itself, projected from its 64 PCA dimensions down to 2D or 3D in your browser. Each dot is one grid point, coloured by the value of the swept factor. A clean shape here is the headline result.
  • Metrics across layers — line charts of each geometry score against depth, so you can see where in the network the structure lives and how it changes.
  • Distance scatter & heatmap — the raw evidence behind the scores: for every pair of grid points, true distance in factor-space vs. distance in the representation, measured both as a straight line and as a walk along the manifold.

The charts below are live: they stream the very same JSON artifacts the dashboard reads, and render as you scroll.

Scope & caveatsWhat Langotime Aionoscope measures — and what it doesn't

  • This is a representation-geometry evaluation, not a steering or causal-intervention result. We read frozen activations; we never replace them or continue generation from an edited state.
  • It complements the linear-probe leaderboard — it does not replace it. A manifold can be clean but curved, or decodable but tangled; the two views answer different questions.
  • The numbers are honest about geometry, not downstream accuracy. A high projection R² means a coordinate is recoverable by following the manifold — not that the model forecasts well.
  • It is deliberately simple: controlled one-factor sweeps with everything else switched off, classical PCA and shortest-path geodesics. No UMAP or t-SNE in the metrics — only as optional eye-candy, never as a score.
How the four scores are computed

Every score starts from the same ingredients. Each grid point gives one centroid — its activation vector, reduced by PCA to 64 dimensions. For each pair of grid points \((i,j)\) we measure three distances:

  • True distance \(d^{\text{true}}_{ij}\) — how far apart the two swept values actually are: \(|v_i - v_j|\), or the shorter way round the circle \(\min(|v_i-v_j|,\,P-|v_i-v_j|)\) for a cyclic factor like phase. This is the ground truth the metric names call “latent.”
  • Straight-line distance \(d^{\text{lin}}_{ij}=\lVert c_i-c_j\rVert_2\) — Euclidean distance between the two centroids in the 64-D representation.
  • Geodesic distance \(d^{\text{geo}}_{ij}\) — shortest path from \(i\) to \(j\) along a \(k\)-nearest-neighbour graph (edges weighted by straight-line distance; \(k\) picked from \(\{4,6,8\}\) to maximise the score). Distance along the manifold rather than straight through it.

Then, over all \(N(N-1)/2\) pairs of grid points:

  • Spearman vs. linear \(=\rho_s\!\big(d^{\text{true}},\,d^{\text{lin}}\big)\) — rank correlation between the true distances and the straight-line distances. \(1.0\) means straight lines order every pair exactly as the true factor does.
  • Spearman vs. geodesic \(=\rho_s\!\big(d^{\text{true}},\,d^{\text{geo}}\big)\) — the same, but with distance measured along the manifold.
  • 5-NN recall \(=\dfrac{1}{N}\sum_i \dfrac{\big|\,\mathcal{N}^{\text{true}}_5(i)\cap\mathcal{N}^{\text{lin}}_5(i)\,\big|}{5}\) — for each point, the fraction of its 5 true nearest neighbours that are also among its 5 nearest in the representation, averaged over all points. Pure local faithfulness.
  • Geodesic gain \(=(\text{Spearman vs. geodesic})-(\text{Spearman vs. linear})\) — a difference: how much pairwise order you recover by walking along the manifold instead of cutting straight across. Large and positive ⇒ the shape is curved or folded but locally faithful.

All three distance matrices are built from the same centroids; Spearman and recall use the upper-triangular pairs \((i<j)\). Computed in manifold_eval.py.

Example 1 · sine_phaseA phase is a circle

Start with the cleanest case. The phase of a sine wave is a circular quantity: phase 0 and phase 2π are the same signal. If a model truly understands phase, its centroids should trace a closed loop — and points near the start should sit right next to points near the end.

Here is Toto-2.0-2.5B at layer 4, sweeping phase across a full cycle. Drag the slider (or hit play): on the left, the input sine slides along; on the right, a dot rides around the manifold. Colour runs from low phase (blue) to high (red).

Input signal ↔ manifold · Toto-2.0-2.5B · sine_phase · layer 4
Sweep one factor; watch the representation move.
Input signal (1.024 s @ 500 Hz)
Representation manifold (PCA)
As phase advances 0 → 2π the wave slides by exactly one period and the dot travels once around a near-perfect ring, returning to its start (cycle-closure error 0.000, circular-order 1.000, isometry ≈ 1.0). Phase, for Toto, really is a circle.

One subtlety the layer view reveals: a shape can be globally right but locally fuzzy. The ring is preserved at every depth, yet the fine-grained ordering of immediate neighbours blurs in the deeper layers.

Metrics across layers · Toto-2.0-2.5B · sine_phase
Higher is better. 49 layers.
Distance preservation (Spearman vs. linear and vs. geodesic) stays pinned near 1.0 from the very first layer to the last — the circle is robust. But 5-NN recall peaks early (≈0.90 at layer 0) and sags toward the middle and end of the network: the manifold keeps its shape while the model trades away some local resolution for whatever the forecasting objective needs downstream.

None of this is automatic. A clean ring is something a model has to build — and not every one does. Here is NuTime-Bias9 on the identical phase sweep, at layer 1. Drag the slider (or rotate the 3-D view) and watch the same dot try to ride the same loop.

Input signal ↔ manifold · NuTime-Bias9 · sine_phase · layer 1
Same sweep, a very different shape. Rotate it.
Input signal (0.352 s @ 500 Hz)
Representation manifold (PCA)
Same sweep, same colours — but the dot threads through a tangle instead of riding a ring. The phase is still in there: immediate neighbours are mostly preserved (5-NN recall ≈ 0.86). It just isn't the dominant shape. The single biggest axis of variation — 63% of it — has nothing to do with phase, and the loop is folded across four separate directions, so the cleanest projection you can draw is a knot, not a circle (circular-order 0.82 vs Toto's 1.000). NuTime even starts a step further back: its tokenizer at layer 0 flattens every phase onto the same point, and the network spends its first layers re-inflating the circle. It does untangle with depth — but watch how far it actually gets (next figure). Same factor, same probe; the geometry here is just a lot messier, and the shape tells you so at a glance.

Give it a few layers, though, and it recovers. By layer 4 NuTime has found the circle again — just not Toto's circle.

Input signal ↔ manifold · NuTime-Bias9 · sine_phase · layer 4
Three layers later: a loop again — but a lumpy one. Rotate it.
Input signal (0.352 s @ 500 Hz)
Representation manifold (PCA)
Three layers on, the tangle has resolved into a loop: the colours now run in order all the way around (circular-order 0.99, up from 0.82), and NuTime has even swung phase onto its main axis — the direction that was pure noise at layer 1. But put it next to Toto and the gap is obvious. This is a lumpy, three-dimensional loop, not a flat ring — you need a third axis to see it whole (the top two capture only half of it), and it overshoots its own start: the ends miss each other by about twice a normal step (closure ratio 2.0 vs Toto's 1.0). NuTime does get to a circle — it just takes a few layers, and never lands as clean as Toto, which already had a flat, closed ring at layer 0.

Example 2 · spike_time_fracWhen a straight line lies

Now slide a single spike from the start of the window to the end. The factor is just its position in time — a plain interval from 0 to 1, exactly the kind of thing you'd expect to come out as a tidy line. At the input-facing layer 0 it comes out as a tangled triangle instead, with almost every position crushed into one corner. Drag the spike across time and watch the dot leap around it — then read on for why.

0.12
straight-line distance order (Spearman) — looks almost random
0.97
distance-along-the-manifold order (geodesic) — nearly perfect
+0.85
geodesic gain — straight-line distance is blind to the order
Input signal ↔ manifold · Toto-2.0-2.5B · spike_time_frac · layer 0
Colour = spike position in time.
Input signal — a single impulse
Representation manifold (PCA)
This isn't a smooth arc — it's the tokenizer's patch grid showing through. Toto reads the signal in 32-sample patches, and a one-sample spike mostly tells a patch where inside it the spike landed. As the spike sweeps, that within-patch position cycles 0→31 over and over — about eleven times across the window — so the time axis folds into ~11 stacked copies. Interior positions pile into the dense corner; the two patch edges fly out to the far corners (fanning by which patch); and the path leaps clear across the figure every time the spike crosses a 32-sample boundary.
How far the representation jumps · Toto-2.0-2.5B · spike_time_frac · layer 0
Step between neighbouring spike positions. Dashed lines mark the 32-sample patch boundaries.
Measured directly, the patch grid is unmistakable: between most neighbouring positions the representation barely moves, then it lurches on a strict 32-sample rhythm — the peaks land right on the patch boundaries (dashed). The geometry is quantised to the tokenizer's patches, not to time itself.

It helps to be explicit about how Toto reads a signal, since that's the whole cause — and it's tempting to picture the model taking one number at a time. It doesn't. Just as a language model splits text into tokens, Toto splits the series into fixed-length patches — 32 samples each — and the transformer sees one token per patch, not one per time step. A patch is the model's smallest unit of time, so a single spike can't say exactly when it occurred; its patch records only which patch it fell in and where inside that patch it sat — the two coordinates this whole example turns on. Everything jumpy you just watched is the geometry being quantised to that patch grid instead of to time.

How Toto turns a signal into tokens
A continuous signal is cut into fixed 32-sample patches — each patch becomes one token
How Toto tokenizes a signal into patches A continuous impulse signal is divided by a 32-sample patch grid into eight patches; each patch becomes one token. The spike's patch and its position inside that patch are highlighted. INPUT SIGNAL — ONE VALUE PER TIME STEP 32 samples t1 t2 t3 t4 t5 t6 t7 t8 WHAT THE TRANSFORMER SEES — ONE TOKEN PER PATCH WHERE INSIDE (FINE) → the comb's teeth WHICH PATCH (COARSE) → the dominant 98.5% axis
Toto never sees the series one step at a time. It slices the window into fixed-length patches — 32 samples each — and hands the transformer one token per patch. So a lone spike carries just two facts: which patch caught it — its coarse position, the part that later becomes the dominant 98.5% axis — and where inside that patch it sits — its fine position, the spread that draws out each tooth of the comb. (Schematic: 8 patches shown; the window here is 512 samples = 16 patches.)

That folding is exactly why a naïve straight-line distance lies — and it's the single most important idea in the suite, so it's worth seeing in the raw pairwise data. The distance scatter plots, for every pair of grid points, true spike-distance on the x-axis against representation distance on the y-axis. A faithful geometry would form a tight rising band.

Distance scatter · Toto-2.0-2.5B · spike_time_frac · layer 0
~4 MB of pairwise distances — loads on scroll.
Blue (straight-line) is a scrambled cloud — because the folding stacks spikes from different patches on top of each other, two positions far apart in time can sit close together in space. Red (geodesic), distance measured along the path, threads through the folds in time order and snaps into a clean rising band. Same activations, two rulers — and only one sees through the patch folding.

So is the model stuck with this folded mess? No — with depth it untangles, and in a revealing way. Watch the straight-line score climb.

Metrics across layers · Toto-2.0-2.5B · spike_time_frac
Watch the blue and red lines converge.
Early on, geodesic order is near-perfect while straight-line order is poor (the folded layer-0 picture) — a big positive geodesic gain. Going deeper, the straight-line score climbs to ~0.99 and the gain falls to zero: the network promotes the coarse position — which patch the spike is in — into a clean, dominant axis, so ordinary distance finally works. The folding doesn't vanish; it gets out-voted by a proper position ruler.

And here is where it gets pretty. By layer 21 the two things the tokenizer was really measuring — which patch, and where inside it — have pulled apart onto separate, orthogonal axes. The manifold turns into a comb: a row of teeth strung along one dominant axis. Rotate it (it defaults to 3-D).

Input signal ↔ manifold · Toto-2.0-2.5B · spike_time_frac · layer 21
Left, the impulse; right, the disentangled patch-number × within-patch comb. Rotate it.
Input signal — a single impulse
Representation manifold (PCA)
The disentangled form of the layer-0 triangle. One axis carries 98.5% of the variance and is simply which patch the spike is in — coarse position, now a clean linear ruler; that's the axis that "straightened" the score above. Along each tooth is where inside the patch the spike sits — the fine position, running from one patch edge to the other. It's a position, not a phase: each tooth is an open segment, not a closed loop — a coordinate that wrapped around would shut into a ring, and this one doesn't. The triangle was these two folded together; here they're orthogonal — which patch across the comb, where-inside along each tooth. (The teeth look this prominent because each 3-D axis is auto-scaled — the within-patch spread is tiny next to the patch axis.)

The spike has a smoother cousin: gaussian_time_frac slides a soft bump across the same window. The bump is wide enough to light up several patches at once, so it never gets folded onto the patch grid the way a one-sample spike does — even at layer 0 its manifold is already a clean line (straight-line order ≈ 0.94 vs. the spike's 0.12). Drag it and compare.

Input signal ↔ manifold · Toto-2.0-2.5B · gaussian_time_frac · layer 0
A smooth bump (σ ≈ 0.035 s) sliding across time.
Input signal — a Gaussian bump
Representation manifold (PCA)
Same sweep, smoother input — a clean, well-ordered line right from layer 0. A sharp one-sample feature gets quantised by the patch grid; a bump spread across several patches does not. Localized impulses are simply harder for a patch tokenizer to place than smooth ones — and the geometry suite makes that difference visible.

Example 3 · linear_trend_slopeWhat the tokenizer throws away

The last example is the most subtle, because the geometry doesn't look broken — it looks empty, for a reason that has nothing to do with attention or depth. Before the first transformer layer runs, a time-series model has to turn raw numbers into tokens, and nearly all of them begin the same way: standardize the window — subtract a location, divide by a scale. Toto then squashes the result through asinh, its own signed-log, before patching. That front-end is what lets one model swallow temperatures, prices and megawatts interchangeably. The price is that it erases the absolute scale of the input.

For most factors that is a harmless convenience. Trend slope is the pathological case, because slope is pure scale — exactly what standardization is built to remove. Standardize a clean ramp y = slope·(t − t̄) by its own standard deviation and the slope cancels exactly: every magnitude collapses onto one and the same normalized ramp, and only the sign survives. Sweep from −10⁶ to +10⁶ and the model effectively sees one descending ramp, then one ascending ramp, and — across more than four orders of magnitude — almost nothing in between.

You might first blame the measurement. Scored across all 49 layers, the same representation looks nearly geometry-free under a linear ruler yet structured under a signed-log one — so perhaps we were only holding it to the wrong scale.

Linear ruler vs. signed-log ruler · Toto-2.0-2.5B · linear_trend_slope
Same activations, two ways of measuring distance.
Under the linear ruler, geodesic isometry peaks at a feeble ~0.16 and 5-NN recall near 0.06 — the slope manifold looks nearly geometry-free. Under the signed-log ruler the very same activations climb to ~0.54 with recall several times higher. Decodability is similar either way (projection R² ~0.72 vs ~0.67, right) — a probe can read the slope out — so it is tempting to declare that signed-log simply reveals the real geometry. Hold that thought.

Before trusting either number, ask a blunter question: how much does the representation move at all as the slope sweeps? And ask it at layer 0 — the tokenizer's own output, before a single transformer block — so whatever we see is the front-end's doing, not something the network pieced together later. Walk the centroids in grid order and measure the jump between neighbours.

How far the representation moves · Toto-2.0-2.5B · linear_trend_slope · layer 0 (tokenizer output)
Neighbour-to-neighbour change in the centroid, across the full ±10⁶ sweep.
Outside a thin band the step size sits at the floating-point floor — neighbouring slopes give identical activations (matching to better than 10⁻¹⁰ once |slope| exceeds about 10⁻⁴). All the motion lives inside |slope| ≲ 5×10⁻⁵ — about 99% of the manifold's whole arc length, crammed into the centre — and it is the scaler's fingerprint, not slope geometry. A standardizer can't divide by zero, so it pins its divisor at a small floor: while the ramp is steep its own spread sets the scale and the slope cancels perfectly, but once the ramp is shallow enough to slip under that floor the division stops cancelling and a shrinking, slope-proportional residue leaks back in. The two ragged shoulders are that residue — the model's jittery response to a ramp dwindling into the floor — while the lone tall spike is the sign flip at dead centre, where the grid steps from the smallest negative slope straight to the smallest positive one and the residual ramp inverts. Every layer tells the same story.

You can feel it by hand. Drag the sweep below: the raw ramp swings from dead flat to a 10⁶ cliff — an enormous, obvious change — while the representation cursor stays pinned to one of two points and only comes alive as you cross the centre. Your eye tracks the full dynamic range; the model is holding on to a single sign bit.

Input signal ↔ manifold · Toto-2.0-2.5B · linear_trend_slope (signed-log) · layer 0
Left, the literal waveform; right, where it lands in the representation. Drag across the whole sweep.
Input signal — a linear ramp
Representation manifold (PCA)
For almost the entire slider the ramp changes dramatically while the centroid does not move at all — the two are anti-correlated by construction. Only near zero, where standardization stops washing the slope out, does the manifold open up. (The signal y-axis is fixed across the sweep, sized so the steepest ±10⁶ ramps fill the panel; colour on the right = slope.)

Because layer 0 is the tokenizer's output with no transformer block behind it, what collapsed above is the tokenizer's doing — which means a model that tokenizes differently should leave a different trace at the very same stage. Toto's front-end divides the whole window by a single global scale, and for a pure ramp that scale is the slope, so it cancels. Our LeNEPA model never takes that global quotient: instead of rescaling the whole series at once, it cuts the signal into short patches and embeds each one with a small convolution, normalizing only locally — so the overall size of the trend is never divided away. Here is the identical measurement, same ±10⁶ sweep, at LeNEPA's layer 0: its own tokenizer output, the very stage where Toto's slope had already vanished.

How far the representation moves · LeNEPA-CauKer2M · linear_trend_slope · layer 0 (tokenizer output)
Same measurement, same ±10⁶ sweep — a tokenizer that keeps the scale.
The mirror image of Toto: the step never falls to the floor. The representation moves at every slope across all twelve orders of magnitude, the jump growing smoothly as the slope gets larger (the gentle rise toward the edges). It is not a perfectly even ruler — the spacing compresses near zero and the path is curved rather than straight — but the slope is encoded continuously, everywhere, not collapsed to a sign. (The y-scale is LeNEPA's own embedding units; what matters is the shape, not its height next to Toto's.)

And you can feel that by hand, the same way. Drag the sweep below: this time the cursor never freezes — as the slope scans from one 10⁶ cliff to the other, it slides along the manifold the whole way, the dead extremes included.

Input signal ↔ manifold · LeNEPA-CauKer2M · linear_trend_slope (signed-log) · layer 0
Left, the literal waveform; right, where it lands in the representation. Drag across the whole sweep.
Input signal — a linear ramp
Representation manifold (PCA)
The companion to Toto's frozen panel — same factor, same sweep. The input ramp behaves identically (nearly flat near zero, a 10⁶ cliff at the ends), but LeNEPA keeps a distinct place for every slope, so the cursor travels continuously instead of sitting pinned at two points. The whole difference is the tokenizer.

The lesson is about the tokenizer, not the ruler. Any factor that is a scale — trend slope, amplitude, a DC offset — is partly or wholly removed by a standardizing front-end before the transformer sees a thing, whereas a tokenizer that keeps the raw scale, like LeNEPA's, encodes it across the entire range. A flat geometry score under one model can be a genuine signal under another; it depends on what the front-end threw away. Reading these manifolds well means knowing what each model keeps at its door — and treating scale-like factors with suspicion, however clean a logarithmic ruler can make them look.

Example 4 · sine_frequency_hzBuilt layer by layer

The examples so far each froze on a single layer. This one is about depth itself. Phase arrived as a clean ring at the very first layer; frequency is something Toto has to build. Sweep a sine's frequency from about 1 Hz up to the edge of what 500 Hz sampling can carry, and watch the same manifold at eight depths at once.

The manifold across layers · Toto-2.0-2.5B · sine_frequency_hz
Each panel is one layer's centroid manifold (PCA → 2D). Streams eight layers on scroll.
low freq high
Each dot is one frequency; the grey path runs low → high. At layer 0 the frequencies are a tangle. Within a few layers the low and middle frequencies peel out into a smooth, ordered arc (blue → amber) — straight-line distance order climbs from 0.74 to ~0.90 and 5-NN recall from 0.60 to ~0.81 — and by the last layer the whole sweep settles into one continuous curve. The figure above each panel is how much of the variance the flat 2-D view captures.

One thing stays stubborn: the highest frequencies (red) never join the tidy arc — they sit in a scattered cloud at every depth. It is tempting to read that as "the model can't tell them apart," and it is about the sampling limit: the sweep deliberately stops at 0.9 × Nyquist (225 Hz at 500 Hz sampling — barely ~2.2 samples per cycle). But the cloud is the opposite of a collapse.

Measured in the full 64-dimensional space, neighbouring high frequencies are the farthest apart of any band — the representation moves about 10× more per step there than among the low frequencies. Near Nyquist a sine is sampled only two-to-four times per cycle, so nudging the frequency throws the sample points onto wildly different parts of each cycle: the input jumps around erratically, and the representation follows. That motion is high-dimensional and disordered — only about 55% of it lands in the flat plane the picture shows, against ~80% for the clean low-frequency arc. So PCA lays the orderly low frequencies flat and crushes the erratic near-Nyquist ones into a corner. The "clump" is a shadow: a real tangle, but one that points out of the page.

How convoluted is the manifold? · effective dimensionality vs frequency
Participation ratio in a sliding frequency window — higher means more independent directions are needed to hold the shape. Three depths, streamed on scroll.
The high-frequency cloud is not a separate hidden variable — it is still the same frequency axis, only folded much harder. At low frequency the manifold is effectively ~2-dimensional (a line you could draw); climbing toward Nyquist it swells to roughly 15 dimensions — a serpentine you cannot flatten. The three layers sit almost on top of one another, so this is a property of the signal, not of any one depth.

Why it folds: the n-th sample of a sampled sine swings with frequency at a rate proportional to n, so near Nyquist the late samples wind almost a full cycle for every extra hertz — faster than our ~1 Hz frequency grid can follow. The representation is still a single, deterministic curve in frequency — a smooth fit recovers about 90% of it — it has just coiled into too many dimensions to lay flat, and the last few percent genuinely outruns the grid.

You could ask the obvious follow-up — what if we just sample finer at the tail? Let's see what happens when we re-encode the same near-Nyquist band on a grid ten times denser — 97 → 225 Hz at Δf = 0.1 Hz, 1,281 frequencies, where neighbouring sines are 0.93-correlated instead of decorrelated.

The visual "spray" does resolve into a thread you can now follow, but it never lies flat. Three PCA axes still hold only a quarter to a half of it, and laying it down to 90% would take roughly two dozen axes. Sampling the coil more finely just reveals more of the coiling — at the deepest layer the dense tail is, if anything, less flat than the sparse one (participation ratio 16 → 22). The manifold is real and deterministic; it simply lives in more directions than a flat page can draw.

The dense tail, rotatable · Toto-2.0-2.5B · sine_frequency_hz (97–225 Hz, Δf = 0.1 Hz)
PCA → 3D of the 10×-denser near-Nyquist tail. Drag to rotate.
low freq high
The same near-Nyquist tail, re-encoded on a grid ten times denser (Δf = 0.1 Hz, 1,281 frequencies), coloured low → high frequency. At the coarse spacing the points scatter. Densified, they knit into one continuous ribbon you can trace, yet it still crosses itself from every viewing angle: PC1–3 hold only 25–47% of its spread. Continuity is here; flatness is not.
How many axes does it take to lay the tail flat?
Cumulative PCA variance vs number of axes kept, three depths.
The first three axes — all a 2-D or 3-D picture can show — capture just 25–47% of the tail; reaching 90% takes about 24–26 axes. This manifold is real but tangled in more than three axes.
Unrolled · a neighbour-preserving view (Isomap)
low freq high
Is it still one curve? Isomap sees only neighbour distances, never the frequency label — yet at the final layer it lays the tail out in almost perfect frequency order (rank correlation ρ = 0.997). The tangle is genuinely a single one-dimensional curve in frequency; it is just coiled too tightly for flat PCA axes to lay down. Earlier layers (ρ ≈ 0.71–0.73) unfold only partway, where the coil still crosses itself.

Open research · a community effort

This is research in progress — come build it with us

Everything here is early, evolving, and deliberately open. We think representation geometry for time-series models is far too interesting — and too big — to keep to ourselves, so we'd love to turn this into a genuine community effort. Collaboration, questions, pushback, and wild ideas are all very welcome.

  • Want your model in the benchmark? We're happy to add more architectures and checkpoints.
  • Missing a factor or geometry you care about? Suggest new targets and sweep families.
  • Spotted a bug, a misleading metric, or a better way to measure? Tell us — we'll fix it.
  • Want to dig in together? We're keen to co-investigate and publish more findings as a group.

If any of that resonates, just send us a message. The open research wiki below has the full idea writeup, current thinking, and how to reach us.

Open research wiki & contact → alex-wiki.langotime.ai/ideas/aionoscope-manifold-reconstruction-benchmark