OpenGenerativeAI / llm-colosseumLinks
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
☆1,463Updated 10 months ago
Alternatives and similar repositories for llm-colosseum
Users that are interested in llm-colosseum are comparing it to the libraries listed below
Sorting:
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,407Updated last year
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,747Updated last year
- ☆445Updated last year
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,154Updated this week
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,852Updated last year
- The first open-source Artificial Narrow Intelligence generalist agentic framework Computer-Using-Agent that fully operates graphical-user…☆1,326Updated 11 months ago
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,823Updated last year
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,659Updated last week
- Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api☆1,212Updated 8 months ago
- ☆3,071Updated 2 months ago
- Open-source tool to visualise your RAG 🔮☆1,216Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,407Updated last year
- [ICLR 2025] Automated Design of Agentic Systems☆1,506Updated last year
- 🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide…☆1,510Updated 6 months ago
- We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 …☆854Updated last year
- Reaching LLaMA2 Performance with 0.1M Dollars☆987Updated last year
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,325Updated last year
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆2,528Updated last week
- OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…☆1,700Updated last year
- Ship RAG based LLM web apps in seconds.☆1,005Updated 2 years ago
- Training LLMs with QLoRA + FSDP☆1,539Updated last year
- An autoagentic AGI that is self-evolving and modular.☆961Updated last year
- Deploy your agentic worfklows to production☆2,073Updated last week
- Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.☆1,130Updated 3 months ago
- Mora: More like Sora for Generalist Video Generation☆1,581Updated last year
- proof of concept prototype for generating and querying against an ever-expanding knowledge graph with ai☆928Updated last year
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones☆1,307Updated this week
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-be…☆3,053Updated 9 months ago
- The Open Source Memory Layer For Autonomous Agents☆2,562Updated last year
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥☆1,684Updated last year