PR #1208

open

Nightcrawler — 1.176bpb 10mb

by newjordanView on GitHub
val_bpb
1.1761
Architecture
Transformer
Optimizer
AdamW
Artifact Size
10MB

Training Techniques

Architecture
crawler bottleneck
Adds a fifth flat transformer layer on each side of the crawler bottleneck, changing the stack from 4F+1C+4F to 5F+1C+5F.
parameters: {"layers_per_side":5,"previous_layers_per_side":4,"bottleneck_layers":1}
shared TAP encoder connections
Uses shared TAP encoder connections to each crawler loop.
parameters: null
Evaluation
sliding window eval
parameters: null

Novel Contributions

  • Adds an extra flat transformer layer on each side of the crawler bottleneck
  • Shares TAP encoder connections across crawler loops
  • Reports sliding-window validation BPB results