PR #1627

open

Evolutionary NAS on only a 5 year old MacBook; within 10% of baseline

by mike-fergusonView on GitHub
val_bpb
1.3246
Architecture
Transformer
Optimizer
Muon
Artifact Size
14,784,469 bytes

Training Techniques

Architecture
depth recurrence
Evolution-selected recurrent depth with shared blocks and repeated passes through layers.
parameters: null
weight tying
Global weight sharing across blocks to reduce unique parameters and increase repeated computation.
parameters: null
Gated Attention
Uses gated carryover/state mixing for recurrent layers.
parameters: null
other
Non-monotonic layer traversal order that processes odd and even layers in interleaved passes.
parameters: {"layer_traversal_mode":"odds_then_evens"}
Quantization
QAT
bits: 8
scope: training export
GPTQ
bits: 6
scope: linears
Compression
zlib
level: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
LR Schedule
cosine decay
parameters: null
Regularization
weight decay
parameters: null

Novel Contributions

  • Evolutionary NAS performed entirely on a local MacBook Pro with zero cloud compute
  • Discovery of gated depth recurrence and non-monotonic layer traversal via evolutionary search
  • Use of block crossover and gene-level evolution over architecture, recurrence, quantization, and optimizer settings
  • Post-training GPTQ int6 artifact generation to fit the submission size budget
  • Demonstration of a laptop-first model reaching within 10% of the baseline