Coding-Benchmarks
Code-Generierung und -Korrektheit auf realen Programmierprojekten
| # | Modell | Anbieter | Polyglot Score |
|---|---|---|---|
| 1 | gpt-5 (high) | OpenAI | |
| 2 | gpt-5 (medium) | OpenAI | |
| 3 | o3-pro (high) | OpenAI | |
| 4 | gemini-2.5-pro-preview-06-05 (32k think) | ||
| 6 | gpt-5 (low) | OpenAI | |
| 5 | o3 (high) | OpenAI | |
| 7 | grok-4 (high) | xAI | |
| 8 | gemini-2.5-pro-preview-06-05 (default think) | ||
| 9 | o3 (high) + gpt-4.1 | OpenAI | |
| 11 | o3 | OpenAI | |
| 10 | Gemini 2.5 Pro Preview 05-06 | ||
| 12 | DeepSeek-V3.2-Exp (Reasoner) | DeepSeek | |
| 13 | Gemini 2.5 Pro Preview 03-25 | ||
| 15 | claude-opus-4-20250514 (32k thinking) | Anthropic | |
| 14 | o4-mini (high) | OpenAI | |
| 16 | DeepSeek R1 (0528) | DeepSeek | |
| 17 | claude-opus-4-20250514 (no think) | Anthropic | |
| 18 | DeepSeek-V3.2-Exp (Chat) | DeepSeek | |
| 19 | claude-3-7-sonnet-20250219 (32k thinking tokens) | Anthropic | |
| 20 | DeepSeek R1 + claude-3-5-sonnet-20241022 | Anthropic | |
| 21 | o1-2024-12-17 (high) | OpenAI | |
| 22 | claude-sonnet-4-20250514 (32k thinking) | Anthropic | |
| 24 | claude-3-7-sonnet-20250219 (no thinking) | Anthropic | |
| 23 | o3-mini (high) | OpenAI | |
| 25 | Qwen3 235B A22B diff, no think, Alibaba API | unknown | |
| 26 | Kimi K2 | unknown | |
| 27 | DeepSeek R1 | DeepSeek | |
| 28 | claude-sonnet-4-20250514 (no thinking) | Anthropic | |
| 30 | gemini-2.5-flash-preview-05-20 (24k think) | ||
| 29 | DeepSeek V3 (0324) | DeepSeek | |
| 31 | Quasar Alpha | unknown | |
| 32 | o3-mini (medium) | OpenAI | |
| 33 | Grok 3 Beta | xAI | |
| 34 | Optimus Alpha | unknown | |
| 35 | gpt-4.1 | OpenAI | |
| 36 | claude-3-5-sonnet-20241022 | Anthropic | |
| 37 | Grok 3 Mini Beta (high) | xAI | |
| 38 | DeepSeek Chat V3 (prev) | DeepSeek | |
| 39 | gemini-2.5-flash-preview-04-17 (default) | ||
| 40 | chatgpt-4o-latest (2025-03-29) | OpenAI | |
| 41 | gpt-4.5-preview | OpenAI | |
| 42 | gemini-2.5-flash-preview-05-20 (no think) | ||
| 43 | gpt-oss-120b (high) | unknown | |
| 44 | Qwen3 32B | unknown | |
| 45 | gemini-exp-1206 | ||
| 46 | Gemini 2.0 Pro exp-02-05 | ||
| 47 | Grok 3 Mini Beta (low) | xAI | |
| 48 | o1-mini-2024-09-12 | OpenAI | |
| 49 | gpt-4.1-mini | OpenAI | |
| 50 | claude-3-5-haiku-20241022 | Anthropic | |
| 51 | chatgpt-4o-latest (2025-02-15) | OpenAI | |
| 52 | QwQ-32B + Qwen 2.5 Coder Instruct | Alibaba | |
| 53 | gpt-4o-2024-08-06 | unknown | |
| 54 | gemini-2.0-flash-exp | ||
| 55 | qwen-max-2025-01-25 | Alibaba | |
| 56 | QwQ-32B | unknown | |
| 58 | gemini-2.0-flash-thinking-exp-01-21 | ||
| 57 | gpt-4o-2024-11-20 | unknown | |
| 59 | DeepSeek Chat V2.5 | DeepSeek | |
| 61 | Llama 4 Maverick | Meta | |
| 62 | yi-lightning | 01.AI | |
| 63 | command-a-03-2025-quality | Cohere | |
| 64 | Codestral 25.01 | Mistral | |
| 65 | openhands-lm-32b-v0.1 | unknown | |
| 66 | gpt-4.1-nano | OpenAI | |
| 67 | Qwen2.5-Coder-32B-Instruct | unknown | |
| 68 | gemma-3-27b-it | unknown | |
| 69 | gpt-4o-mini-2024-07-18 | unknown |