cxcscmu/deepresearch_benchmarking

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cxcscmu/deepresearch_benchmarking)

cxcscmu / deepresearch_benchmarking

☆29

Alternatives and similar repositories for deepresearch_benchmarking

Users that are interested in deepresearch_benchmarking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OPPO-PersonalAI / FINDER_DEFT
View on GitHub
Official implementation for paper "How Far Are We from Genuinely Useful Deep Research Agents?"
☆66Dec 10, 2025Updated 7 months ago
Fu-Dayuan / AgentRefine
View on GitHub
(ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning
☆20Nov 22, 2025Updated 8 months ago
chuzhumin98 / PRE
View on GitHub
A general framework used on evaluating the performance of large language models (LLMs) based on the peer review mechanism among LLMs
☆19Aug 3, 2024Updated last year
RUCAIBox / R1-Searcher-plus
View on GitHub
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
☆81May 25, 2025Updated last year
RUCAIBox / SimpleDeepSearcher
View on GitHub
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
☆120Jun 3, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Ayanami0730 / deep_research_bench
View on GitHub
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆796May 11, 2026Updated 2 months ago
NJU-LINK / DR3-Eval
View on GitHub
☆39May 7, 2026Updated 2 months ago
YuyaoZhangQAQ / QCompiler
View on GitHub
This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.
☆17Oct 20, 2025Updated 9 months ago
kimdanny / Fair-RAG
View on GitHub
ICTIR 2025 "Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation"
☆15Sep 19, 2024Updated last year
oneal2000 / SR-Agents
View on GitHub
SRA-Bench and SR-Agents: a benchmark and toolkit for skill-retrieval-augmented LLM agents.
☆93Jul 2, 2026Updated 3 weeks ago
NJU-LINK / OmniVideoBench
View on GitHub
The Source Code for OmniVideoBench @ICLR 2026
☆77Feb 12, 2026Updated 5 months ago
youdotcom-oss / ydc-deep-research-evals
View on GitHub
you.com's framework for evaluating deep research systems.
☆75May 15, 2025Updated last year
smallporridge / TrustworthyRAG
View on GitHub
☆16May 18, 2026Updated 2 months ago
ysh-1998 / CoWPiRec
View on GitHub
The official implementation for Collaborative Word-based Pre-trained Item Representation for Transferable Recommendation.
☆25Jan 30, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
rickyang1114 / multimodal-deepresearcher
View on GitHub
[AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
☆57Jun 8, 2026Updated last month
IssamLaradji / wisenet
View on GitHub
☆10Nov 8, 2020Updated 5 years ago
yrahal / paircoder
View on GitHub
☆12May 1, 2023Updated 3 years ago
plageon / HierSearch
View on GitHub
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
☆40Oct 9, 2025Updated 9 months ago
OpenBMB / RAG-DDR
View on GitHub
This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".
☆23Oct 28, 2024Updated last year
OpenBMB / MetaMem
View on GitHub
[ACL '26] This is the code repo for our ACL '26 Findings paper "MetaMem: Evolving Meta-Memory for Knowledge Utilization through Self-Refl…
☆39Jul 2, 2026Updated 3 weeks ago
sunblaze-ucb / AgentSynth
View on GitHub
[ICLR 2026] AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
☆49Apr 17, 2026Updated 3 months ago
YunjiaXi / InfoDeepSeek
View on GitHub
Code for InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
☆18May 29, 2025Updated last year
mcp-tool-bench / MCPToolBenchPP
View on GitHub
MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability
☆44Mar 17, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
thu-coai / CharacterBench
View on GitHub
[AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
☆23Aug 1, 2025Updated 11 months ago
namespace-Pt / TSGen
View on GitHub
☆13Oct 28, 2024Updated last year
linjh1118 / WisdoMentor
View on GitHub
WisdoMentor - Series: A LLM for undergraduates | 博导智言(辅助大学生学习)
☆13May 9, 2024Updated 2 years ago
8421BCD / Agentic-R
View on GitHub
[ACL 2026 Findings] Repo for paper "Agentic-R: Learning to Retrieve for Agentic Search"
☆91Apr 9, 2026Updated 3 months ago
Linn3a / siren
View on GitHub
Official implementation of Selective Entropy Regularization (SIREN), proposed by paper 'Rethinking Entropy Regularization in Large Reason…
☆32Dec 10, 2025Updated 7 months ago
linjh1118 / Llama3-Chinese-ORPO
View on GitHub
基于Llama3，通过进一步CPT，SFT，ORPO得到的中文版Llama3
☆16Apr 24, 2024Updated 2 years ago
IssamLaradji / GP_DRF
View on GitHub
Official code for "Efficient Deep Gaussian Process Models for Variable-Sized Inputs" - accepted in IJCNN2019
☆15Jul 17, 2019Updated 7 years ago
oneal2000 / JuDGE
View on GitHub
Code for JuDGE, SIGIR 2025 Long Paper
☆35Aug 7, 2025Updated 11 months ago
hkust-nlp / WebExplorer
View on GitHub
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆120Sep 29, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DeepExperience / agent2world
View on GitHub
🪐 Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
☆23Jan 29, 2026Updated 5 months ago
eleonoreft / DelibAnalysis
View on GitHub
Project Discourse Quality for political deliberations online.
☆10Feb 1, 2021Updated 5 years ago
ByebyeMonica / Reasoning-Agentic-RAG
View on GitHub
Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
☆30May 14, 2025Updated last year
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
DaoD / DCL
View on GitHub
From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking
☆14Oct 25, 2022Updated 3 years ago
ChuangtaoChen-TUM / KVPacket
View on GitHub
☆31Updated this week
F2-Song / Weak-to-Strong-Decoding
View on GitHub
The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"
☆22Jun 26, 2025Updated last year