aymeric-roucher / agent_reasoning_benchmarkLinks
π§ Compare how Agent systems perform on several benchmarks. ππ
β98Updated 8 months ago
Alternatives and similar repositories for agent_reasoning_benchmark
Users that are interested in agent_reasoning_benchmark are comparing it to the libraries listed below
Sorting:
- β121Updated 10 months ago
- Beating the GAIA benchmark with Transformers Agents. πβ123Updated 4 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"β112Updated 9 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)β115Updated 4 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoningβ57Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.β78Updated 9 months ago
- Train your own SOTA deductive reasoning modelβ94Updated 3 months ago
- β178Updated 10 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive argumentsβ81Updated 8 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)β91Updated 5 months ago
- accompanying material for sleep-time compute paperβ95Updated last month
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"β109Updated 2 weeks ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".β66Updated 11 months ago
- Simple examples using Argilla tools to build AIβ53Updated 7 months ago
- β118Updated 9 months ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [Fβ¦β66Updated last year
- β69Updated 4 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.β30Updated 2 months ago
- β123Updated 8 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" π€β70Updated 6 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Modelsβ97Updated last year
- β41Updated 6 months ago
- A framework for few-shot evaluation of language models.β33Updated 3 months ago
- β75Updated 5 months ago
- β86Updated 2 weeks ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.β101Updated 4 months ago
- Verifiers for LLM Reinforcement Learningβ60Updated 2 months ago
- Repository for βPlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makersβ, NAACL24β139Updated last year
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"β212Updated last week
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β225Updated 7 months ago