Reasoning-Benchmarks
Allgemeine Denkfähigkeit, Anweisungsfolgen und Sprachverständnis
| # | Modell | Anbieter | MT-bench Score |
|---|---|---|---|
| 1 | GPT-4-1106-preview | OpenAI | |
| 2 | GPT-4-0613 | OpenAI | |
| 3 | Qwen2-72B-Instruct | Alibaba | |
| 4 | GPT-4-0314 | OpenAI | |
| 5 | Qwen1.5-110B-Chat | Alibaba | |
| 7 | Qwen1.5-72B-Chat | Alibaba | |
| 6 | Mistral Medium | Mistral | |
| 8 | GPT-3.5-Turbo-0613 | OpenAI | |
| 9 | GPT-3.5-Turbo-1106 | OpenAI | |
| 10 | Mixtral-8x7B-Instruct-v0.1 | Mistral | |
| 11 | Qwen1.5-32B-Chat | Alibaba | |
| 12 | Claude-2.1 | Anthropic | |
| 13 | Starling-LM-7B-beta | Nexusflow | |
| 14 | Starling-LM-7B-alpha | UC Berkeley | |
| 15 | Claude-2.0 | Anthropic | |
| 16 | GPT-3.5-Turbo-0314 | OpenAI | |
| 17 | Qwen1.5-14B-Chat | Alibaba | |
| 18 | Claude-1 | Anthropic | |
| 19 | Tulu-2-DPO-70B | AllenAI/UW | |
| 20 | Claude-Instant-1 | Anthropic | |
| 21 | OpenChat-3.5 | OpenChat | |
| 22 | OpenChat-3.5-0106 | OpenChat | |
| 23 | WizardLM-70B-v1.0 | Microsoft | |
| 25 | Mistral-7B-Instruct-v0.2 | Mistral | |
| 24 | Qwen1.5-7B-Chat | Alibaba | |
| 26 | SOLAR-10.7B-Instruct-v1.0 | Upstage AI | |
| 27 | NV-Llama2-70B-SteerLM-Chat | Nvidia | |
| 28 | Zephyr-7B-beta | HuggingFace | |
| 29 | WizardLM-13b-v1.2 | Microsoft | |
| 30 | Vicuna-33B | LMSYS | |
| 31 | WizardLM-30B | Microsoft | |
| 32 | Qwen-14B-Chat | Alibaba | |
| 33 | Vicuna-13B-16k | LMSYS | |
| 34 | Zephyr-7B-alpha | HuggingFace | |
| 35 | Llama-2-70B-chat | Meta | |
| 36 | Mistral-7B-Instruct-v0.1 | Mistral | |
| 37 | WizardLM-13B-v1.1 | Microsoft | |
| 38 | Llama-2-13b-chat | Meta | |
| 39 | Vicuna-13B | LMSYS | |
| 40 | Guanaco-33B | UW | |
| 41 | Tulu-30B | AllenAI/UW | |
| 43 | OpenAssistant-LLaMA-30B | OpenAssistant | |
| 42 | Guanaco-65B | UW | |
| 44 | PaLM-Chat-Bison-001 | ||
| 45 | MPT-30B-chat | MosaicML | |
| 46 | WizardLM-13B-v1.0 | Microsoft | |
| 47 | Llama-2-7B-chat | Meta | |
| 48 | Vicuna-7B-16k | LMSYS | |
| 49 | Vicuna-7B | LMSYS | |
| 50 | Baize-v2-13B | UCSD | |
| 51 | XGen-7B-8K-Inst | Salesforce | |
| 52 | Nous-Hermes-13B | NousResearch | |
| 53 | MPT-7B-Chat | MosaicML | |
| 54 | GPT4All-13B-Snoozy | Nomic AI | |
| 55 | Koala-13B | UC Berkeley | |
| 56 | MPT-30B-Instruct | MosaicML | |
| 57 | Falcon-40B-Instruct | TII | |
| 58 | ChatGLM2-6B | Tsinghua | |
| 59 | H2O-Oasst-OpenLLaMA-13B | h2oai | |
| 60 | Alpaca-13B | Stanford | |
| 61 | ChatGLM-6B | Tsinghua | |
| 62 | OpenAssistant-Pythia-12B | OpenAssistant | |
| 63 | RWKV-4-Raven-14B | RWKV | |
| 64 | Dolly-V2-12B | Databricks | |
| 65 | FastChat-T5-3B | LMSYS | |
| 66 | StableLM-Tuned-Alpha-7B | Stability AI | |
| 67 | LLaMA-13B | Meta |