PR #1021

open

Non-record: MC Dropout ensembling is negative for small LMs

by abaybektursunView on GitHub
val_bpb
1.3250
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Regularization
dropout
parameters: {"rates":[0.3,0.05]}
Evaluation
MC Dropout ensembling
parameters: {"k":16}

Novel Contributions

  • Evaluated MC Dropout ensembling for a 17M-parameter language model
  • Showed that averaging 16 dropout samples at inference does not improve BPB
  • Found deterministic single-pass inference outperforms MC Dropout at both tested dropout rates
  • Argued that dropout-induced sub-network diversity is insufficient at this scale