New FinMCP-Bench Benchmark Tests AI Models on Real-World Financial Problem-Solving With 613 Samples and 65 Financial Tools
Summary
A new benchmark called FinMCP-Bench launches to rigorously test AI models on real-world financial problem-solving, featuring 613 samples, 65 real financial tools, and 33 sub-scenarios designed to measure both tool invocation accuracy and reasoning capabilities across mainstream large language models.
Key Points
- FinMCP-Bench is a new benchmark designed to evaluate large language models on real-world financial problem-solving through tool invocation under the Model Context Protocol.
- The benchmark includes 613 samples across 10 main scenarios and 33 sub-scenarios, incorporating 65 real financial MCPs and three task types: single tool, multi-tool, and multi-turn interactions.
- Researchers use FinMCP-Bench to systematically assess mainstream LLMs with metrics that measure both tool invocation accuracy and reasoning capabilities, providing a standardized testbed for financial AI agent research.