PR #947
openNon-record: Legal Neural-Only No-TTT Alt (8xH100) val_bpb=1.1576
by aamodbhattView on GitHub
val_bpb
1.1576
Architecture
Transformer
Optimizer
—
Artifact Size
14,921,440 bytes
Training Techniques
Test-Time Training
score-first TTT
parameters: {"enabled":false}
Evaluation
sliding window eval
parameters: {"enabled":false}
Architecture
MLP3x
Larger neural configuration using increased MLP multiplier.
parameters: {"mlp_mult":3.2}
BigramHash
Uses a larger bigram vocabulary size override in the model preset.
parameters: {"bigram_vocab_size":2048}
Novel Contributions
- Compliance-focused neural-only submission
- No n-gram or two-pass cache blending during evaluation
- No test-time training
- Larger model preset with BIGRAM_VOCAB_SIZE=2048 and MLP_MULT=3.2
- Sliding eval disabled to keep runtime bounded