OpenGenerativeAI / llm-colosseumLinks

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

☆1,438

Alternatives and similar repositories for llm-colosseum

Users that are interested in llm-colosseum are comparing it to the libraries listed below

Sorting:

OS-Copilot / OS-Copilot
An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
☆1,668Updated 10 months ago
McGill-NLP / webllama
Llama-3 agents that can browse the web by following instructions and talking to you
☆1,411Updated 7 months ago
xlang-ai / OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆1,978Updated this week
SqueezeAILab / LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
☆1,715Updated last year
MLSysOps / MLE-agent
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide…
☆1,321Updated this week
agiresearch / AIOS
AIOS: AI Agent Operating System
☆4,383Updated last week
togethercomputer / MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
☆2,787Updated 6 months ago
bin123apple / AutoCoder
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 …
☆851Updated last year
myshell-ai / JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
☆984Updated 11 months ago
mistralai-sf24 / hackathon
☆447Updated last year
ShengranHu / ADAS
[ICLR 2025] Automated Design of Agentic Systems
☆1,380Updated 5 months ago
OpenCodeInterpreter / OpenCodeInterpreter
OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…
☆1,668Updated last year
cohere-ai / cohere-toolkit
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
☆3,069Updated last week
KhoomeiK / LlamaGym
Fine-tune LLM agents with online reinforcement learning
☆1,204Updated last year
mistralai / megablocks-public
☆864Updated last year
gabrielchua / RAGxplorer
Open-source tool to visualise your RAG 🔮
☆1,145Updated 6 months ago
VILA-Lab / ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxi…
☆967Updated last year
Codium-ai / AlphaCodium
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
☆3,869Updated 7 months ago
lichao-sun / Mora
Mora: More like Sora for Generalist Video Generation
☆1,565Updated 9 months ago
AnswerDotAI / fsdp_qlora
Training LLMs with QLoRA + FSDP
☆1,494Updated 8 months ago
trotsky1997 / MathBlackBox
☆1,027Updated 7 months ago
lucidrains / self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,394Updated last year
microsoft / promptbench
A unified evaluation framework for large language models
☆2,663Updated 2 weeks ago
Link-AGI / AutoAgents
[IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.
☆1,382Updated last year
NousResearch / Hermes-Function-Calling
☆938Updated 10 months ago
THUDM / AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆2,683Updated 5 months ago
WecoAI / aideml
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.
☆959Updated this week
mistralai / mistral-common
Official inference library for pre-processing of Mistral models
☆760Updated this week
mistralai / mistral-finetune
☆2,986Updated 10 months ago
Eladlev / AutoPrompt
A framework for prompt tuning using Intent-based Prompt Calibration
☆2,688Updated 3 months ago