val_bpb
1.2257
Architecture
Transformer
Optimizer
—
Artifact Size
15,916,206 bytes
Training Techniques
Architecture
AuxNet
A tiny auxiliary network runs alongside the main LM to predict whether the next token has a leading space and to generate a small residual logit correction.
parameters: {"aux_dim":32,"bottleneck_in":512,"bottleneck_out":512}
tied embeddings
Uses a tied embedding matrix to project the auxiliary residual edit back into LM logits.
parameters: null
smear transformation
Applies a smear transformation to token embeddings using a learned lower-triangular matrix to encourage progressive feature building.
parameters: null
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Other
other
Auxiliary BCE loss trains a binary boundary classifier predicting whether the next token has a leading space.
parameters: {"aux_loss_weight":0.1}
Novel Contributions
- Auxiliary network that predicts next-token leading-space presence
- Residual logit correction generated from auxiliary low-dimensional features
- Auxiliary BCE loss to encourage boundary-aware representations
- Smear transformation on token embeddings with a learned lower-triangular matrix