![[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton — GitHub Daily Trend cover](https://d3wo5wojvuv7l.cloudfront.net/t_rss_itunes_square_1400/images.spreaker.com/original/e808acb6ee9eef8320a0dac3ee5b3160.jpg)
https://github.com/triton-lang/triton/pull/7298 Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...
Loading summary