PR #818

open

[record] add HWNODE record - 0.5527

by lucamignattiView on GitHub

val_bpb

0.5527

Architecture

HWNODE (Hammerstein-Wiener Neural ODE)

Optimizer

—

Artifact Size

15.74 MB

Training Techniques

Architecture

HWNODE

Linear Neural ODE wrapped between two non-linear projections, using a Taylor-series approximation of the matrix exponential to generate multiple effective layers from shared weights.

parameters: {"order":2,"state_dim":864}

spectral normalization

Applied to enforce stability and encourage orthogonality, helping generate more unique effective layers.

parameters: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Other

other

Drift-free TTT and Cubric N-gram processing were used to improve the final score.

parameters: null

Test-Time Training

TTT

parameters: null

Quantization

mixed int5/int6

bits: null

scope: all

Novel Contributions

Introduced Hammerstein-Wiener Neural ODEs (HWNODEs) for the challenge.
Used a Taylor-series approximation of the matrix exponential to compile a continuous ODE trajectory into a single dense matrix pass.
Generated multiple effective layers from a single set of shared weights.
Applied spectral normalization to stabilize the model and encourage orthogonality.
Used drift-free TTT and Cubric N-gram processing to improve validation performance.