val_bpb
1.3557
Architecture
RNN
Optimizer
—
Artifact Size
12.62MB
Training Techniques
Architecture
depth recurrence
Stripped recurrent block_anchor model with fixed four-step recurrence schedule 0,0,1,1 (A-A-C-C) and grouped anchor gating.
parameters: {"schedule":"0,0,1,1","groups":2,"model_dim":768,"num_heads":12,"num_kv_heads":6,"shared_cores":2}
Quantization
int8
bits: 8
scope: model
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 2048
eval_length: null
LR Schedule
linear warmup
parameters: {"warmup_steps":5}
Novel Contributions
- Non-record unlimited-compute submission under the 16MB track
- Stripped recurrent block_anchor candidate with A-A-C-C recurrence schedule
- Grouped anchor gating (g2) with a simplified exported trainer snapshot
- Int8 + zlib packaged artifact fitting within the submission size limit