PR #437

open

commit non-record

val_bpb
1.2257
Architecture
Transformer
Optimizer
Artifact Size
15,916,206 bytes

Training Techniques

Architecture
AuxNet
A tiny auxiliary network runs alongside the main LM to predict whether the next token has a leading space and to generate a small residual logit correction.
parameters: {"aux_dim":32,"bottleneck_in":512,"bottleneck_out":512}
tied embeddings
Uses a tied embedding matrix to project the auxiliary residual edit back into LM logits.
parameters: null
smear transformation
Applies a smear transformation to token embeddings using a learned lower-triangular matrix to encourage progressive feature building.
parameters: null
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Other
other
Auxiliary BCE loss trains a binary boundary classifier predicting whether the next token has a leading space.
parameters: {"aux_loss_weight":0.1}

Novel Contributions

  • Auxiliary network that predicts next-token leading-space presence
  • Residual logit correction generated from auxiliary low-dimensional features
  • Auxiliary BCE loss to encourage boundary-aware representations
  • Smear transformation on token embeddings with a learned lower-triangular matrix