PR #1345

open

Add non-record submission for full80 recurrence seq1536 20k

by shasank0001View on GitHub
val_bpb
1.1763
Architecture
Transformer
Optimizer
Artifact Size
15,501,266 bytes

Training Techniques

Architecture
depth recurrence
Fixed middle-block recurrence added to the full80 mixed-lowbit family.
parameters: {"layers":10,"model_dim":576,"num_heads":8,"num_kv_heads":4,"recurrent_mode":"fixed","recurrent_core_start":3,"recurrent_core_len":2,"recurrent_steps":2}
step embedding
Learned step-aware recurrence embeddings were added to the recurrent blocks.
parameters: {"enabled":true,"init_std":0.01}
Sequence Length
sequence_length
train_length: 1536
eval_length: null
Regularization
magnitude pruning
parameters: {"pct":0.033}
Quantization
mixed int4/int8
bits: 4
scope: export rescue

Novel Contributions

  • Non-record unlimited-compute submission for the 16MB track
  • Recurrence-based continuation of the full80 mixed-lowbit family
  • Fixed middle-block recurrence with learned step-aware recurrence embeddings
  • Training at sequence length 1536
  • Export rescue using BIGRAM_EXPORT_BITS=4 and MAG_PRUNE_PCT=0.033
  • Achieved val_bpb 1.17631839 while staying under the 16,000,000 byte cap