kolenaIO / autoarenaLinks

Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation

☆105

Alternatives and similar repositories for autoarena

Users that are interested in autoarena are comparing it to the libraries listed below

Sorting:

ammirsm / llamaindex-omakase-rag
This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing d…
☆146Updated last year
Not-Diamond / RoRF
Routing on Random Forest (RoRF)
☆181Updated 10 months ago
darshil3011 / AutoMetaRAG
Dynamic Metadata based RAG Framework
☆75Updated last year
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆76Updated 9 months ago
anyscale / llm-router
Tutorial for building LLM router
☆220Updated last year
run-llama / human_in_the_loop_workflow_demo
☆74Updated 10 months ago
run-llama / workflows-py
Workflows are an event-driven, async-first, step-based way to control the execution flow of AI applications like agents.
☆154Updated this week
rsrohan99 / Llama-Researcher
Research assistant for performing online research on a given topic, using Llamaindex Workflows and Tavily API. Inspired by GPT-Researcher
☆163Updated 10 months ago
run-llama / gmail-extractor
☆123Updated last year
run-llama / llama_extract
☆122Updated 5 months ago
mendableai / rag-arena
Open-source RAG evaluation through users' feedback
☆194Updated last year
tg1482 / priomptipy
A python implementation of priompt - a neat way of managing context from diverse sources for LLM applications.
☆112Updated 3 weeks ago
mtwn105 / decipher-research-agent
Turn topics, links, and files into AI-generated research notebooks — summarize, explore, and ask anything.
☆129Updated last month
SerjSmor / open-intent-classifier
A project that enables identification and classification of an intent of a message with dynamic labels
☆43Updated 7 months ago
topoteretes / awesome-ai-memory
A list of AI memory projects
☆179Updated 6 months ago
telekom / create-tsi
Create-tsi is a generative AI RAG toolkit which generates AI Applications with low code.
☆234Updated 8 months ago
saharmor / voice-lab
Testing and evaluation framework for voice agents
☆129Updated last month
parea-ai / parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
☆78Updated 5 months ago
agentsea / surfkit
A toolkit for building computer use AI agents
☆170Updated last month
weaviate / gorilla
Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.
☆133Updated last month
LyzrCore / lyzr-automata
low-code multi-agent automation framework
☆255Updated last year
LyzrCore / lyzr
Lyzr SDKs help you to build all your favorite GenAI SaaS products as enterprise applications in minutes.
☆181Updated 7 months ago
chisasaw / redcache-ai
A memory framework for Large Language Models and Agents.
☆183Updated 7 months ago
CYQIQ / MultiCoT
Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph
☆146Updated last year
cfahlgren1 / observers
A Lightweight Library for AI Observability
☆249Updated 5 months ago
lz-chen / research-agent
☆87Updated 2 months ago
rsrohan99 / dynamic-few-shot-llamaindex-workflow
☆53Updated 9 months ago
chrislatimer / microagent
A fork of OpenAI Swarm that supports Groq and Anthropic
☆121Updated 5 months ago
diicellman / dspy-gradio-rag
RAG example using DSPy, Gradio, FastAPI
☆83Updated last year
misbahsy / RAGTune
Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)
☆264Updated last year