aymeric-roucher/agent_reasoning_benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aymeric-roucher/agent_reasoning_benchmark)

aymeric-roucher / agent_reasoning_benchmark

🔧 Compare how Agent systems perform on several benchmarks. 📊🚀

☆102

Alternatives and similar repositories for agent_reasoning_benchmark

Users that are interested in agent_reasoning_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aymeric-roucher / GAIA
View on GitHub
Beating the GAIA benchmark with Transformers Agents. 🚀
☆153Feb 19, 2025Updated last year
YuCheng1106 / PromptSapper
View on GitHub
☆18Jun 26, 2024Updated 2 years ago
jingjiang02 / M2CAN
View on GitHub
Code for Information Fusion 2025 Paper "Multi-Source Multi-Modal Domain Adaptation"
☆20Feb 4, 2025Updated last year
shivanshkaushikk / rag-fusion
View on GitHub
RAG-Fusion implementation using Langchain, Weaviate and OpenAI
☆13Oct 31, 2023Updated 2 years ago
run-llama / pdf-viewer
View on GitHub
Display PDFs in your RAG app
☆20Feb 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
raytheonbbn / landsar-base
View on GitHub
The LandSAR search and rescue platform
☆12Dec 19, 2025Updated 7 months ago
Ag2S1 / Sibyl-System
View on GitHub
☆125Aug 13, 2024Updated last year
zjunlp / MachineSoM
View on GitHub
[ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View
☆119Jun 6, 2025Updated last year
Marcnuth / deduplication
View on GitHub
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆18Aug 28, 2023Updated 2 years ago
AlexCheema / tinygrad
View on GitHub
You like pytorch? You like micrograd? You love tinygrad! ❤️
☆18Feb 14, 2025Updated last year
nec-research / agentquest
View on GitHub
☆29Apr 3, 2025Updated last year
wangbx66 / differentially-private-q-learning
View on GitHub
☆13May 16, 2019Updated 7 years ago
henryzhao5852 / DELFT
View on GitHub
☆12Feb 26, 2020Updated 6 years ago
ExpressAI / AI-Gaokao
View on GitHub
Gaokao Benchmark for AI
☆109Jul 8, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
victorvikram / ConceptARC
View on GitHub
Materials for ConceptARC paper
☆119Feb 10, 2026Updated 5 months ago
fuyuanlyu / OptFS
View on GitHub
This repository contains PyTorch implemenation of WWW 2023 research paper: Optimizing Feature Set for Click-through Rate Prediction.
☆12Oct 23, 2023Updated 2 years ago
modal-labs / ci-on-modal
View on GitHub
A sample pattern for running CI tests on Modal
☆19Apr 12, 2025Updated last year
chonghin33 / lcm-1.13-whitepaper
View on GitHub
This project contains the original white paper for Language Construct Modeling (LCM) v1.13, authored by Vincent Shing Hin Chong. It intro…
☆15Jul 23, 2025Updated 11 months ago
UT-SysML / rumors-in-multi-agent
View on GitHub
Code for AAAI Workshop WMAC "Paper Simulating Rumor Spreading in Social Networks using LLM agents"
☆13Feb 20, 2025Updated last year
LydiaXiaohongLi / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆19Jul 20, 2023Updated 3 years ago
NumberChiffre / mcts-llm
View on GitHub
☆98Dec 16, 2024Updated last year
awslabs / aws-cv-unique-information
View on GitHub
We define and estimate smooth unique information of samples with respect to classifier weights and predictions. We compute these quantiti…
☆11Mar 9, 2021Updated 5 years ago
pizofreude / metaprompt
View on GitHub
Metaprompt is an AI-powered prompt generator developed by Anthropic. This is the unofficial Metaprompt Community Github repo. All PRs are…
☆14Mar 19, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
allenai / clarifydelphi
View on GitHub
☆13Apr 24, 2024Updated 2 years ago
cyfml / OPSTL
View on GitHub
OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments
☆14Oct 25, 2023Updated 2 years ago
bhimrazy / chat-with-phi-3-vision
View on GitHub
Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…
☆34Jan 2, 2025Updated last year
KyujinHan / Sakura-SOLAR-DPO
View on GitHub
Sakura-SOLAR-DPO: Merge, SFT, and DPO
☆116Dec 30, 2023Updated 2 years ago
sencheng / CoBeL-RL
View on GitHub
Closed-loop simulator of complex behavior and learning based on reinforcement learning and deep neural networks
☆15Mar 20, 2026Updated 4 months ago
Orlando-CS / Awesome-RL-in-Generative-AI
View on GitHub
✨✨latest advancements of RL in generative ai
☆16Aug 18, 2025Updated 11 months ago
lucylow / Covid_Control
View on GitHub
Machine learning to predict future number Covid19 Daily Cases (7-day moving average). Long Short Term Memory (LSTM) Predictor and Reinfor…
☆14Feb 21, 2021Updated 5 years ago
camenduru / OneFormer-colab
View on GitHub
☆14Dec 26, 2023Updated 2 years ago
aiovine / converse-dataset
View on GitHub
Natural language dataset for training a Conversational Recommender System
☆11Jul 9, 2019Updated 7 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
yulkang / pylabyk
View on GitHub
Pylab-style utilities for PyTorch and Matplotlib, among others.
☆16Jan 22, 2026Updated 5 months ago
Shiguang-Guo / Open-Grounded-Planning
View on GitHub
☆11Jun 11, 2024Updated 2 years ago
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
othr-nlp / rage_toolkit
View on GitHub
☆11Sep 27, 2024Updated last year
raytheonbbn / landsar-sdk
View on GitHub
The motion model software development kit for the LandSAR search and rescue software platform
☆19Feb 3, 2026Updated 5 months ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
Liyubov / heterogeneous-dynamics-on-networks
View on GitHub
code for epidemics spreading, heterogeneous random walk on network
☆13Apr 12, 2021Updated 5 years ago