rungalileo / agent-leaderboardLinks
Ranking LLMs on agentic tasks
☆209Updated 2 months ago
Alternatives and similar repositories for agent-leaderboard
Users that are interested in agent-leaderboard are comparing it to the libraries listed below
Sorting:
- ☆237Updated last month
- Readymade evaluators for agent trajectories☆454Updated 4 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆141Updated 4 months ago
- Tutorial for building LLM router☆242Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆144Updated 11 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆495Updated 4 months ago
- Testing and evaluation framework for voice agents☆162Updated 7 months ago
- A bot with memory, built on LangGraph Cloud.☆146Updated last year
- Terminal-based AI Coding Agent, similar to Claude Code, OpenAI Codex etc. but works with many more LLMs e.g. Gemini, Groq, Deepseek☆151Updated 8 months ago
- ☆220Updated 6 months ago
- ☆182Updated this week
- ☆147Updated last year
- ☆182Updated 11 months ago
- Training setup for Langchain's Open Deep Research☆74Updated 4 months ago
- ☆76Updated last year
- Repository demonstrating best practices and patterns for implementing agentic workflows in Python, featuring modular, scalable, and reusa…☆184Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆113Updated last year
- ☆80Updated 3 months ago
- ☆120Updated 5 months ago
- MCP (Model Context Protocol) server for Weaviate☆162Updated 7 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆103Updated 5 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆128Updated 11 months ago
- Together Open Deep Research☆354Updated 9 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!☆464Updated last year
- An example of multi-agent orchestration with llama-index☆445Updated 11 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆302Updated last month
- ☆104Updated 9 months ago
- An agent benchmark with tasks in a simulated software company.☆626Updated 2 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS tooling☆135Updated last year
- Routing on Random Forest (RoRF)☆239Updated last year