OpenGenerativeAI / llm-colosseumLinks
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
☆1,434Updated 3 months ago
Alternatives and similar repositories for llm-colosseum
Users that are interested in llm-colosseum are comparing it to the libraries listed below
Sorting:
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,407Updated 6 months ago
- ☆2,973Updated 9 months ago
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,704Updated 11 months ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆1,946Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,064Updated 10 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,663Updated 9 months ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,345Updated 5 months ago
- Harness LLMs with Multi-Agent Programming☆3,432Updated this week
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆2,639Updated 5 months ago
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,310Updated last year
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,761Updated 5 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,165Updated last month
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,904Updated 10 months ago
- An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents☆5,634Updated 9 months ago
- Tools for merging pretrained large language models.☆5,853Updated last week
- Reaching LLaMA2 Performance with 0.1M Dollars☆983Updated 11 months ago
- A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.☆2,866Updated this week
- AIOS: AI Agent Operating System☆4,280Updated 2 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,788Updated last week
- Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api☆1,138Updated 3 weeks ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆1,756Updated 6 months ago
- Open-source tool to visualise your RAG 🔮☆1,136Updated 5 months ago
- The Open Source Memory Layer For Autonomous Agents☆2,260Updated 8 months ago
- ☆446Updated last year
- Fine-tune LLM agents with online reinforcement learning☆1,196Updated last year
- [ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs☆1,673Updated last week
- Superfast AI decision making and intelligent processing of multi-modal data.☆2,651Updated this week
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,385Updated last year
- [NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist w…☆839Updated 2 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,692Updated 3 months ago