Research2026-05-06
Gradient Boosting within a Single Attention Layer
Source: Arxiv CS.AI
arXiv:2604.03190v2 Announce Type: replace-cross Abstract: Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We introduce \emph{gradient-boosted attention}, which applies the principle of gradient boosting...
arxivpapers