Agenten & Tool-Use Benchmarks
Funktionsaufrufe und strukturiertes Tool-Routing in komplexen Workflows
| # | Modell | Anbieter | FC Score |
|---|---|---|---|
| 1 | Claude-Opus-4-5-20251101 | Anthropic | |
| 2 | Claude-Sonnet-4-5-20250929 | Anthropic | |
| 4 | GLM-4.6 (FC thinking) | Zhipu AI | |
| 5 | Grok-4-1-fast-reasoning | xAI | |
| 6 | Claude-Haiku-4-5-20251001 | Anthropic | |
| 7 | Gemini-3-Pro-Preview | ||
| 10 | Grok-4-0709 | xAI | |
| 11 | Moonshotai-Kimi-K2-Instruct | MoonshotAI | |
| 12 | Grok-4-1-fast-non-reasoning | xAI | |
| 13 | Command A Reasoning | Cohere | |
| 14 | DeepSeek-V3.2-Exp (Prompt + Thinking) | DeepSeek | |
| 15 | Gemini-2.5-Flash | ||
| 16 | GPT-5.2-2025-12-11 | OpenAI | |
| 17 | GPT-5-mini-2025-08-07 | OpenAI | |
| 18 | xLAM-2-32b-fc-r | Salesforce | |
| 19 | DeepSeek-V3.2-Exp | DeepSeek | |
| 20 | GPT-4.1-2025-04-14 | OpenAI | |
| 21 | o4-mini-2025-04-16 | OpenAI | |
| 22 | xLAM-2-70b-fc-r | Salesforce | |
| 24 | GPT-5-nano-2025-08-07 | OpenAI | |
| 25 | Nanbeige4-3B-Thinking-2511 | Nanbeige | |
| 27 | GPT-4.1-mini-2025-04-14 | OpenAI | |
| 29 | Qwen3-32B | Qwen | |
| 30 | o3-2025-04-16 | OpenAI | |
| 31 | Qwen3-235B-A22B-Instruct-2507 | Qwen | |
| 32 | Nanbeige3.5-Pro-Thinking | Nanbeige | |
| 34 | xLAM-2-8b-fc-r | Salesforce | |
| 35 | Command A | Cohere | |
| 36 | BitAgent-Bounty-8B | Bittensor | |
| 37 | Arch-Agent-32B | katanemo | |
| 39 | Qwen3-8B | Qwen | |
| 40 | ToolACE-2-8B | Huawei Noah & USTC | |
| 41 | Qwen3-30B-A3B-Instruct-2507 | Qwen | |
| 42 | xLAM-2-3b-fc-r | Salesforce | |
| 43 | Qwen3-14B | Qwen | |
| 46 | mistral-large-2411 | Mistral AI | |
| 49 | Mistral-Medium-2505 | Mistral AI | |
| 50 | Llama-4-Maverick-17B-128E-Instruct-FP8 | Meta | |
| 51 | Mistral-small-2506 | Mistral AI | |
| 52 | Gemini-2.5-Flash-Lite | ||
| 54 | Qwen3-4B-Instruct-2507 | Qwen | |
| 56 | Arch-Agent-3B | katanemo | |
| 58 | GPT-4.1-nano-2025-04-14 | OpenAI | |
| 60 | Arch-Agent-1.5B | katanemo | |
| 61 | Command R7B | Cohere | |
| 62 | Llama-3.3-70B-Instruct | Meta | |
| 64 | Hammer2.1-7b | MadeAgents | |
| 65 | xLAM-2-1b-fc-r | Salesforce | |
| 68 | Hammer2.1-3b | MadeAgents | |
| 71 | Qwen3-1.7B | Qwen | |
| 72 | Llama-4-Scout-17B-16E-Instruct | Meta | |
| 74 | CoALM-70B | UIUC + Oumi | |
| 75 | Hammer2.1-1.5b | MadeAgents | |
| 76 | palmyra-x-004 | Writer | |
| 78 | Open-Mistral-Nemo-2407 | Mistral AI | |
| 81 | Granite-3.1-8B-Instruct | IBM | |
| 80 | Amazon-Nova-2-Lite-v1:0 | Amazon | |
| 82 | Falcon3-10B-Instruct | TII UAE | |
| 83 | Granite-3.2-8B-Instruct | IBM | |
| 84 | CoALM-8B | UIUC + Oumi | |
| 86 | MiniCPM3-4B-FC | openbmb | |
| 88 | Amazon-Nova-Pro-v1:0 | Amazon | |
| 91 | Falcon3-7B-Instruct | TII UAE | |
| 92 | Qwen3-0.6B | Qwen | |
| 93 | Granite-20b-FunctionCalling | IBM | |
| 95 | Amazon-Nova-Micro-v1:0 | Amazon | |
| 98 | Llama-3.2-3B-Instruct | Meta | |
| 100 | Hammer2.1-0.5b | MadeAgents | |
| 103 | Granite-4.0-350m | IBM | |
| 104 | Falcon3-3B-Instruct | TII UAE | |
| 105 | Ministral-8B-Instruct-2410 | Mistral AI | |
| 106 | Falcon3-1B-Instruct | TII UAE | |
| 107 | Llama-3.2-1B-Instruct | Meta | |
| 108 | Llama-3.1-Nemotron-Ultra-253B-v1 | NVIDIA |