PR #818

open

[record] add HWNODE record - 0.5527

by lucamignattiView on GitHub
val_bpb
0.5527
Architecture
HWNODE (Hammerstein-Wiener Neural ODE)
Optimizer
Artifact Size
15.74 MB

Training Techniques

Architecture
HWNODE
Linear Neural ODE wrapped between two non-linear projections, using a Taylor-series approximation of the matrix exponential to generate multiple effective layers from shared weights.
parameters: {"order":2,"state_dim":864}
spectral normalization
Applied to enforce stability and encourage orthogonality, helping generate more unique effective layers.
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Drift-free TTT and Cubric N-gram processing were used to improve the final score.
parameters: null
Test-Time Training
TTT
parameters: null
Quantization
mixed int5/int6
bits: null
scope: all

Novel Contributions

  • Introduced Hammerstein-Wiener Neural ODEs (HWNODEs) for the challenge.
  • Used a Taylor-series approximation of the matrix exponential to compile a continuous ODE trajectory into a single dense matrix pass.
  • Generated multiple effective layers from a single set of shared weights.
  • Applied spectral normalization to stabilize the model and encourage orthogonality.
  • Used drift-free TTT and Cubric N-gram processing to improve validation performance.