aymeric-roucher / agent_reasoning_benchmark
π§ Compare how Agent systems perform on several benchmarks. ππ
β47Updated 3 weeks ago
Related projects β
Alternatives and complementary repositories for agent_reasoning_benchmark
- Beating the GAIA benchmark with Transformers Agents. πβ62Updated 3 weeks ago
- β103Updated 3 months ago
- Automating enterprise workflows with multimodal agentsβ94Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β48Updated 4 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"β80Updated 2 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)β64Updated this week
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paperβ¦β96Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.β72Updated 2 months ago
- Track the progress of LLM context utilisationβ53Updated 4 months ago
- Official homepage for "Self-Harmonized Chain of Thought"β83Updated 2 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.β47Updated 10 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ106Updated 3 weeks ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuningβ41Updated 11 months ago
- EcoAssistant: using LLM assistant more affordably and accuratelyβ129Updated 4 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".β65Updated 4 months ago
- β40Updated 2 weeks ago
- Evaluating LLMs with CommonGen-Liteβ85Updated 8 months ago
- DSPY on action with OpenSource LLMs.β57Updated 7 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"β75Updated last month
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [Fβ¦β58Updated 5 months ago
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.β110Updated last month
- β78Updated this week
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraphβ146Updated 7 months ago
- β64Updated last month
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ97Updated 7 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β133Updated last month
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ40Updated 8 months ago
- β87Updated 9 months ago
- β41Updated 2 months ago
- Experimental Code for StructuredRAG: Structured Outputs in Retrieval-Augmented Generationβ94Updated this week