Research

724 articles found

New 'Wall Attention' Variant Delivers Per-Channel Forgetting Rates and Efficient Autoregressive Decoding With Full GQA Support

New 'Wall Attention' Variant Delivers Per-Channel Forgetting Rates and Efficient Autoregressive Decoding With Full GQA Support

Jun 03, 2026
GitHub

A new attention variant called Wall Attention launches with per-channel, per-timestep multiplicative decay for independent forgetting rates, backed by optimized Triton kernels enabling efficient autoregressive decoding, full GQA support, attention sinks, sliding windows, and sequence packing with verified numerical accuracy.

Previous
Page 8 of 73
Next
Showing 71 - 80 of 724 articles