Google Launches Agentic Vision in Gemini 3 Flash, Enabling AI to Actively Investigate Images Through Code Execution
Summary
Google launches Agentic Vision in Gemini 3 Flash, revolutionizing AI image analysis by enabling the model to actively investigate photos through Python code execution, cropping, and annotation in a Think-Act-Observe loop that delivers 5-10% quality improvements across vision benchmarks.
Key Points
- Google launches Agentic Vision in Gemini 3 Flash, a new capability that transforms static image processing into an active investigation process using visual reasoning combined with code execution
- The technology follows a Think-Act-Observe loop where the model analyzes queries, generates Python code to manipulate images through cropping and annotation, then observes transformed results for better context
- Agentic Vision delivers 5-10% quality improvements across vision benchmarks and enables new use cases including zooming into fine details, image annotation with visual scratchpads, and visual math with plotting capabilities