rungalileo / agent-leaderboardLinks
Ranking LLMs on agentic tasks
☆200Updated 3 weeks ago
Alternatives and similar repositories for agent-leaderboard
Users that are interested in agent-leaderboard are comparing it to the libraries listed below
Sorting:
- Tutorial for building LLM router☆236Updated last year
- ☆234Updated 2 weeks ago
- Readymade evaluators for agent trajectories☆415Updated 3 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆138Updated 3 months ago
- ☆182Updated 9 months ago
- A list of AI memory projects☆255Updated 11 months ago
- Repository demonstrating best practices and patterns for implementing agentic workflows in Python, featuring modular, scalable, and reusa…☆178Updated last year
- A bot with memory, built on LangGraph Cloud.☆141Updated last year
- Testing and evaluation framework for voice agents☆160Updated 6 months ago
- ☆79Updated 2 months ago
- ☆74Updated last year
- ☆101Updated 8 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆138Updated 9 months ago
- ☆219Updated 5 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆479Updated 3 months ago
- Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation☆108Updated 11 months ago
- Semantic Chunker is a lightweight Python package for semantically-aware chunking and clustering of text.☆283Updated 7 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆123Updated 10 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!☆461Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆114Updated last year
- ☆148Updated last year
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆242Updated last year
- ☆176Updated last week
- A Lightweight Library for AI Observability☆252Updated 9 months ago
- ☆125Updated 9 months ago
- ☆113Updated 4 months ago
- A clean, modular SDK for building AI agents with OpenHands V1.☆265Updated this week
- Optimized Large Language Models for Financial Applications – Efficient, Scalable, and Domain-Specific AI for Finance.☆50Updated 5 months ago
- Research assistant for performing online research on a given topic, using Llamaindex Workflows and Tavily API. Inspired by GPT-Researcher☆168Updated last year