PR #947

open

Non-record: Legal Neural-Only No-TTT Alt (8xH100) val_bpb=1.1576

by aamodbhattView on GitHub
val_bpb
1.1576
Architecture
Transformer
Optimizer
Artifact Size
14,921,440 bytes

Training Techniques

Test-Time Training
score-first TTT
parameters: {"enabled":false}
Evaluation
sliding window eval
parameters: {"enabled":false}
Architecture
MLP3x
Larger neural configuration using increased MLP multiplier.
parameters: {"mlp_mult":3.2}
BigramHash
Uses a larger bigram vocabulary size override in the model preset.
parameters: {"bigram_vocab_size":2048}

Novel Contributions

  • Compliance-focused neural-only submission
  • No n-gram or two-pass cache blending during evaluation
  • No test-time training
  • Larger model preset with BIGRAM_VOCAB_SIZE=2048 and MLP_MULT=3.2
  • Sliding eval disabled to keep runtime bounded