PR #1203

open

Add non-record recurrent 0011+g2 R768 submission

by raider99kView on GitHub
val_bpb
1.3557
Architecture
RNN
Optimizer
Artifact Size
12.62MB

Training Techniques

Architecture
depth recurrence
Stripped recurrent block_anchor model with fixed four-step recurrence schedule 0,0,1,1 (A-A-C-C) and grouped anchor gating.
parameters: {"schedule":"0,0,1,1","groups":2,"model_dim":768,"num_heads":12,"num_kv_heads":6,"shared_cores":2}
Quantization
int8
bits: 8
scope: model
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 2048
eval_length: null
LR Schedule
linear warmup
parameters: {"warmup_steps":5}

Novel Contributions

  • Non-record unlimited-compute submission under the 16MB track
  • Stripped recurrent block_anchor candidate with A-A-C-C recurrence schedule
  • Grouped anchor gating (g2) with a simplified exported trainer snapshot
  • Int8 + zlib packaged artifact fitting within the submission size limit