val_bpb
0.5527
Architecture
HWNODE (Hammerstein-Wiener Neural ODE)
Optimizer
—
Artifact Size
15.74 MB
Training Techniques
Architecture
HWNODE
Linear Neural ODE wrapped between two non-linear projections, using a Taylor-series approximation of the matrix exponential to generate multiple effective layers from shared weights.
parameters: {"order":2,"state_dim":864}
spectral normalization
Applied to enforce stability and encourage orthogonality, helping generate more unique effective layers.
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Drift-free TTT and Cubric N-gram processing were used to improve the final score.
parameters: null
Test-Time Training
TTT
parameters: null
Quantization
mixed int5/int6
bits: null
scope: all
Novel Contributions
- Introduced Hammerstein-Wiener Neural ODEs (HWNODEs) for the challenge.
- Used a Taylor-series approximation of the matrix exponential to compile a continuous ODE trajectory into a single dense matrix pass.
- Generated multiple effective layers from a single set of shared weights.
- Applied spectral normalization to stabilize the model and encourage orthogonality.
- Used drift-free TTT and Cubric N-gram processing to improve validation performance.