OpenGenerativeAI / llm-colosseumLinks
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
☆1,438Updated 4 months ago
Alternatives and similar repositories for llm-colosseum
Users that are interested in llm-colosseum are comparing it to the libraries listed below
Sorting:
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,668Updated 10 months ago
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,411Updated 7 months ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,978Updated this week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,715Updated last year
- 🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide…☆1,321Updated this week
- AIOS: AI Agent Operating System☆4,383Updated last week
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,787Updated 6 months ago
- We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 …☆851Updated last year
- Reaching LLaMA2 Performance with 0.1M Dollars☆984Updated 11 months ago
- ☆447Updated last year
- [ICLR 2025] Automated Design of Agentic Systems☆1,380Updated 5 months ago
- OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…☆1,668Updated last year
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,069Updated last week
- Fine-tune LLM agents with online reinforcement learning☆1,204Updated last year
- ☆864Updated last year
- Open-source tool to visualise your RAG 🔮☆1,145Updated 6 months ago
- A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxi…☆967Updated last year
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,869Updated 7 months ago
- Mora: More like Sora for Generalist Video Generation☆1,565Updated 9 months ago
- Training LLMs with QLoRA + FSDP☆1,494Updated 8 months ago
- ☆1,027Updated 7 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,394Updated last year
- A unified evaluation framework for large language models☆2,663Updated 2 weeks ago
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,382Updated last year
- ☆938Updated 10 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,683Updated 5 months ago
- AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.☆959Updated this week
- Official inference library for pre-processing of Mistral models☆760Updated this week
- ☆2,986Updated 10 months ago
- A framework for prompt tuning using Intent-based Prompt Calibration☆2,688Updated 3 months ago