Google Develops Infini-Attention AI That Processes Infinite Context Using 114× Less Memory

Jan 11, 2026
Towards Data Science
Article image for Google Develops Infini-Attention AI That Processes Infinite Context Using 114× Less Memory

Summary

Google researchers unveil Infini-attention AI architecture that processes unlimited context while using 114 times less memory by compressing conversation histories into fixed-size matrices, successfully handling up to 1 million tokens and achieving breakthrough performance on book summarization and information retrieval tasks.

Key Points

  • Google researchers develop Infini-attention, a new architecture that achieves infinite context processing with 114× less memory usage by storing compressed summaries instead of entire conversation histories in a fixed-size memory matrix
  • The system splits attention into local mechanisms for immediate context and global linear attention for compressed historical data, using a learned gating parameter to balance short-term and long-term information flows
  • Testing shows the model successfully processes up to 1 million tokens, achieves state-of-the-art performance on book summarization tasks with 500k token contexts, and demonstrates improved accuracy on needle-in-haystack retrieval tests

Tags

Read Original Article