aymeric-roucher / agent_reasoning_benchmarkLinks
π§ Compare how Agent systems perform on several benchmarks. ππ
β103Updated 6 months ago
Alternatives and similar repositories for agent_reasoning_benchmark
Users that are interested in agent_reasoning_benchmark are comparing it to the libraries listed below
Sorting:
- β125Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"β120Updated 3 months ago
- Mixing Language Models with Self-Verification and Meta-Verificationβ112Updated last year
- Codebase accompanying the Summary of a Haystack paper.β80Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Modelsβ101Updated 2 years ago
- β84Updated 2 years ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".β69Updated last year
- EcoAssistant: using LLM assistant more affordably and accuratelyβ134Updated last year
- β130Updated last year
- LLM reads a paper and produce a working prototypeβ60Updated 10 months ago
- Beating the GAIA benchmark with Transformers Agents. πβ146Updated 11 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)β193Updated 2 weeks ago
- Repository for βPlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makersβ, NAACL24β151Updated last year
- β56Updated last year
- β236Updated 3 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"β239Updated 4 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)β128Updated last year
- β123Updated last year
- Evaluating LLMs with CommonGen-Liteβ94Updated last year
- This repository implements the chain of verification paper by Meta AIβ196Updated 2 years ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β77Updated last year
- β43Updated last year
- β61Updated 7 months ago
- Synthetic Data for LLM Fine-Tuningβ121Updated 2 years ago
- β131Updated 9 months ago
- β147Updated last year
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ91Updated 5 months ago
- β120Updated last year
- An implemtation of Everyting of Thoughts (XoT).β155Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)β92Updated last year