Thinking Machines Lab Launches 'Interaction Models' Capable of Real-Time Multimodal AI With No External Scaffolding
Summary
Thinking Machines Lab unveils 'interaction models,' a groundbreaking new class of AI that natively handles real-time audio, video, and text simultaneously using a 200ms micro-turn design, outperforming competitors with entirely new capabilities like proactive visual reaction and time-triggered speech that no existing commercial model can currently perform.
Key Points
- Thinking Machines Lab is unveiling a research preview of 'interaction models,' a new class of AI systems that natively handle real-time, multimodal interaction across audio, video, and text without relying on external scaffolding or harnesses.
- Unlike traditional turn-based AI models that wait for users to finish before responding, interaction models use a multi-stream, 200ms micro-turn design that enables simultaneous speech, visual proactivity, time-awareness, and seamless dialog management, keeping humans continuously in the loop.
- Benchmarks show TML-Interaction-Small leads competing models in both interaction quality and responsiveness, introducing entirely new capabilities such as proactive visual reaction and time-triggered speech that no existing commercial model can currently perform.