Agenten & Tool-Use Benchmarks

Funktionsaufrufe und strukturiertes Tool-Routing in komplexen Workflows

Letzte Aktualisierung: 2026-05-31 Quelle: Berkeley Function Calling Skala: 0–100 % (höher = besser) 74 Modelle
# Modell Anbieter FC Score
1 Claude-Opus-4-5-20251101 Anthropic
77.47
2 Claude-Sonnet-4-5-20250929 Anthropic
73.24
4 GLM-4.6 (FC thinking) Zhipu AI
72.38
5 Grok-4-1-fast-reasoning xAI
69.57
6 Claude-Haiku-4-5-20251001 Anthropic
68.70
7 Gemini-3-Pro-Preview Google
68.14
10 Grok-4-0709 xAI
61.38
11 Moonshotai-Kimi-K2-Instruct MoonshotAI
59.06
12 Grok-4-1-fast-non-reasoning xAI
58.29
13 Command A Reasoning Cohere
57.06
14 DeepSeek-V3.2-Exp (Prompt + Thinking) DeepSeek
56.73
15 Gemini-2.5-Flash Google
56.24
16 GPT-5.2-2025-12-11 OpenAI
55.87
17 GPT-5-mini-2025-08-07 OpenAI
55.46
18 xLAM-2-32b-fc-r Salesforce
54.66
19 DeepSeek-V3.2-Exp DeepSeek
54.12
20 GPT-4.1-2025-04-14 OpenAI
53.96
21 o4-mini-2025-04-16 OpenAI
53.24
22 xLAM-2-70b-fc-r Salesforce
53.07
24 GPT-5-nano-2025-08-07 OpenAI
51.45
25 Nanbeige4-3B-Thinking-2511 Nanbeige
51.40
27 GPT-4.1-mini-2025-04-14 OpenAI
50.45
29 Qwen3-32B Qwen
48.71
30 o3-2025-04-16 OpenAI
48.56
31 Qwen3-235B-A22B-Instruct-2507 Qwen
47.99
32 Nanbeige3.5-Pro-Thinking Nanbeige
47.68
34 xLAM-2-8b-fc-r Salesforce
46.68
35 Command A Cohere
46.49
36 BitAgent-Bounty-8B Bittensor
46.23
37 Arch-Agent-32B katanemo
45.37
39 Qwen3-8B Qwen
42.57
40 ToolACE-2-8B Huawei Noah & USTC
42.44
41 Qwen3-30B-A3B-Instruct-2507 Qwen
41.39
42 xLAM-2-3b-fc-r Salesforce
41.22
43 Qwen3-14B Qwen
41.03
46 mistral-large-2411 Mistral AI
38.37
49 Mistral-Medium-2505 Mistral AI
37.56
50 Llama-4-Maverick-17B-128E-Instruct-FP8 Meta
37.29
51 Mistral-small-2506 Mistral AI
37.15
52 Gemini-2.5-Flash-Lite Google
36.87
54 Qwen3-4B-Instruct-2507 Qwen
35.68
56 Arch-Agent-3B katanemo
35.36
58 GPT-4.1-nano-2025-04-14 OpenAI
33.05
60 Arch-Agent-1.5B katanemo
32.14
61 Command R7B Cohere
32.07
62 Llama-3.3-70B-Instruct Meta
31.90
64 Hammer2.1-7b MadeAgents
31.67
65 xLAM-2-1b-fc-r Salesforce
30.44
68 Hammer2.1-3b MadeAgents
29.71
71 Qwen3-1.7B Qwen
28.41
72 Llama-4-Scout-17B-16E-Instruct Meta
28.13
74 CoALM-70B UIUC + Oumi
27.99
75 Hammer2.1-1.5b MadeAgents
27.88
76 palmyra-x-004 Writer
27.87
78 Open-Mistral-Nemo-2407 Mistral AI
27.63
81 Granite-3.1-8B-Instruct IBM
27.10
80 Amazon-Nova-2-Lite-v1:0 Amazon
27.10
82 Falcon3-10B-Instruct TII UAE
27.01
83 Granite-3.2-8B-Instruct IBM
26.87
84 CoALM-8B UIUC + Oumi
26.81
86 MiniCPM3-4B-FC openbmb
25.55
88 Amazon-Nova-Pro-v1:0 Amazon
24.97
91 Falcon3-7B-Instruct TII UAE
24.03
92 Qwen3-0.6B Qwen
23.93
93 Granite-20b-FunctionCalling IBM
23.23
95 Amazon-Nova-Micro-v1:0 Amazon
22.29
98 Llama-3.2-3B-Instruct Meta
21.95
100 Hammer2.1-0.5b MadeAgents
21.22
103 Granite-4.0-350m IBM
18.98
104 Falcon3-3B-Instruct TII UAE
16.25
105 Ministral-8B-Instruct-2410 Mistral AI
11.10
106 Falcon3-1B-Instruct TII UAE
11.08
107 Llama-3.2-1B-Instruct Meta
10.82
108 Llama-3.1-Nemotron-Ultra-253B-v1 NVIDIA
10.00