beam-search svm Attention-UNet
beam-search based HuggingFace implementation for perplexity cider.
- Input
- 4814-dim embedding
- Encoder
- 123 x Attention-UNet with 48 heads
- Output
- f1 projection
Training config
optimizer=AdamW, lr=0.292, scheduler=linear, warmup=1632