Deep Learning

846 articles found

New 'Wall Attention' Variant Delivers Per-Channel Forgetting Rates and Efficient Autoregressive Decoding With Full GQA Support

New 'Wall Attention' Variant Delivers Per-Channel Forgetting Rates and Efficient Autoregressive Decoding With Full GQA Support

Jun 03, 2026
GitHub

A new attention variant called Wall Attention launches with per-channel, per-timestep multiplicative decay for independent forgetting rates, backed by optimized Triton kernels enabling efficient autoregressive decoding, full GQA support, attention sinks, sliding windows, and sequence packing with verified numerical accuracy.

US Commerce Department Closes Loophole Letting Chinese AI Firms Acquire Advanced Chips Through Overseas Subsidiaries

US Commerce Department Closes Loophole Letting Chinese AI Firms Acquire Advanced Chips Through Overseas Subsidiaries

Jun 02, 2026
TNW | Nvidia

The US Commerce Department closes a major loophole that allowed Chinese AI firms to secretly acquire advanced Nvidia and AMD chips through overseas subsidiaries, tying export restrictions to a company's headquarters nationality rather than physical location — though hundreds of thousands of chips may have already slipped through during a …

MiniMax M3 Launches as Open-Weight AI Model Outperforming GPT-5.5 With 1M Token Context and Native Multimodal Support

MiniMax M3 Launches as Open-Weight AI Model Outperforming GPT-5.5 With 1M Token Context and Native Multimodal Support

Jun 02, 2026
MiniMax

MiniMax M3 launches as a groundbreaking open-weight AI model that outperforms GPT-5.5 and Gemini 2.5 Pro on key benchmarks, featuring a 1M token context window, native multimodal support, and autonomous capabilities powerful enough to independently reproduce a research paper over 12 hours, available now via desktop app and API starting …

Previous
Page 11 of 85
Next
Showing 101 - 110 of 846 articles