Tsinghua University and Tencent Hunyuan Release Spatial-TTT, a Streaming Spatial Intelligence Framework Achieving State-of-the-Art Video Benchmark Results
Tsinghua University and Tencent Hunyuan unveil Spatial-TTT, a groundbreaking streaming spatial intelligence framework that uses Test-Time Training to continuously update spatial memory from live video streams, achieving state-of-the-art results on video spatial benchmarks like VSI-Bench, with code, a 97k-sample dataset, and a lightweight model now publicly available.