PR #437

open

commit non-record

by jupramView on GitHub

val_bpb

1.2257

Architecture

Transformer

Optimizer

—

Artifact Size

15,916,206 bytes

Training Techniques

Architecture

AuxNet

A tiny auxiliary network runs alongside the main LM to predict whether the next token has a leading space and to generate a small residual logit correction.

parameters: {"aux_dim":32,"bottleneck_in":512,"bottleneck_out":512}

tied embeddings

Uses a tied embedding matrix to project the auxiliary residual edit back into LM logits.

parameters: null

smear transformation

Applies a smear transformation to token embeddings using a learned lower-triangular matrix to encourage progressive feature building.

parameters: null

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Other

other

Auxiliary BCE loss trains a binary boundary classifier predicting whether the next token has a leading space.

parameters: {"aux_loss_weight":0.1}

Novel Contributions

Auxiliary network that predicts next-token leading-space presence
Residual logit correction generated from auxiliary low-dimensional features
Auxiliary BCE loss to encourage boundary-aware representations
Smear transformation on token embeddings with a learned lower-triangular matrix