Coding-Benchmarks

Code-Generierung und -Korrektheit auf realen Programmierprojekten

Letzte Aktualisierung: 2026-05-31 Quelle: Aider Polyglot Skala: 0–100 % (höher = besser) 68 Modelle
# Modell Anbieter Polyglot Score
1 gpt-5 (high) OpenAI
88.00
2 gpt-5 (medium) OpenAI
86.70
3 o3-pro (high) OpenAI
84.90
4 gemini-2.5-pro-preview-06-05 (32k think) Google
83.10
6 gpt-5 (low) OpenAI
81.30
5 o3 (high) OpenAI
81.30
7 grok-4 (high) xAI
79.60
8 gemini-2.5-pro-preview-06-05 (default think) Google
79.10
9 o3 (high) + gpt-4.1 OpenAI
78.20
11 o3 OpenAI
76.90
10 Gemini 2.5 Pro Preview 05-06 Google
76.90
12 DeepSeek-V3.2-Exp (Reasoner) DeepSeek
74.20
13 Gemini 2.5 Pro Preview 03-25 Google
72.90
15 claude-opus-4-20250514 (32k thinking) Anthropic
72.00
14 o4-mini (high) OpenAI
72.00
16 DeepSeek R1 (0528) DeepSeek
71.40
17 claude-opus-4-20250514 (no think) Anthropic
70.70
18 DeepSeek-V3.2-Exp (Chat) DeepSeek
70.20
19 claude-3-7-sonnet-20250219 (32k thinking tokens) Anthropic
64.90
20 DeepSeek R1 + claude-3-5-sonnet-20241022 Anthropic
64.00
21 o1-2024-12-17 (high) OpenAI
61.70
22 claude-sonnet-4-20250514 (32k thinking) Anthropic
61.30
24 claude-3-7-sonnet-20250219 (no thinking) Anthropic
60.40
23 o3-mini (high) OpenAI
60.40
25 Qwen3 235B A22B diff, no think, Alibaba API unknown
59.60
26 Kimi K2 unknown
59.10
27 DeepSeek R1 DeepSeek
56.90
28 claude-sonnet-4-20250514 (no thinking) Anthropic
56.40
30 gemini-2.5-flash-preview-05-20 (24k think) Google
55.10
29 DeepSeek V3 (0324) DeepSeek
55.10
31 Quasar Alpha unknown
54.70
32 o3-mini (medium) OpenAI
53.80
33 Grok 3 Beta xAI
53.30
34 Optimus Alpha unknown
52.90
35 gpt-4.1 OpenAI
52.40
36 claude-3-5-sonnet-20241022 Anthropic
51.60
37 Grok 3 Mini Beta (high) xAI
49.30
38 DeepSeek Chat V3 (prev) DeepSeek
48.40
39 gemini-2.5-flash-preview-04-17 (default) Google
47.10
40 chatgpt-4o-latest (2025-03-29) OpenAI
45.30
41 gpt-4.5-preview OpenAI
44.90
42 gemini-2.5-flash-preview-05-20 (no think) Google
44.00
43 gpt-oss-120b (high) unknown
41.80
44 Qwen3 32B unknown
40.00
45 gemini-exp-1206 Google
38.20
46 Gemini 2.0 Pro exp-02-05 Google
35.60
47 Grok 3 Mini Beta (low) xAI
34.70
48 o1-mini-2024-09-12 OpenAI
32.90
49 gpt-4.1-mini OpenAI
32.40
50 claude-3-5-haiku-20241022 Anthropic
28.00
51 chatgpt-4o-latest (2025-02-15) OpenAI
27.10
52 QwQ-32B + Qwen 2.5 Coder Instruct Alibaba
26.20
53 gpt-4o-2024-08-06 unknown
23.10
54 gemini-2.0-flash-exp Google
22.20
55 qwen-max-2025-01-25 Alibaba
21.80
56 QwQ-32B unknown
20.90
58 gemini-2.0-flash-thinking-exp-01-21 Google
18.20
57 gpt-4o-2024-11-20 unknown
18.20
59 DeepSeek Chat V2.5 DeepSeek
17.80
61 Llama 4 Maverick Meta
15.60
62 yi-lightning 01.AI
12.90
63 command-a-03-2025-quality Cohere
12.00
64 Codestral 25.01 Mistral
11.10
65 openhands-lm-32b-v0.1 unknown
10.20
66 gpt-4.1-nano OpenAI
8.90
67 Qwen2.5-Coder-32B-Instruct unknown
8.00
68 gemma-3-27b-it unknown
4.90
69 gpt-4o-mini-2024-07-18 unknown
3.60