Infinity-AILab/DeepResearchEval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Infinity-AILab/DeepResearchEval)

Infinity-AILab / DeepResearchEval

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation.

☆141

Alternatives and similar repositories for DeepResearchEval

Users that are interested in DeepResearchEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Feb 20, 2026Updated 5 months ago
w-yibo / R1-Compress
View on GitHub
[NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
☆17Jan 24, 2026Updated 5 months ago
UniX-AI-Lab / WorldReasonBench
View on GitHub
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
☆22May 19, 2026Updated 2 months ago
StarDewXXX / UltraHorizon
View on GitHub
Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
☆27Sep 30, 2025Updated 9 months ago
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HJYao00 / MM-DeepResearch
View on GitHub
MLLM, DeepResearch, Agentic AI
☆18Jun 1, 2026Updated last month
MiroMindAI / MiroTrain
View on GitHub
MiroTrain is an efficient and algorithm-first framework research agent.
☆142Aug 27, 2025Updated 10 months ago
MiroMindAI / MiroEval
View on GitHub
MiroEval: A benchmark and evaluation framework for deep research agents — 100 tasks (70 text, 30 multimodal) assessed across synthesis qu…
☆46Jul 6, 2026Updated 2 weeks ago
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
MiroMindAI / MiroMind-M1
View on GitHub
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.
☆279Aug 12, 2025Updated 11 months ago
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆254Jun 24, 2026Updated 3 weeks ago
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
StarDewXXX / Awesome-Hybrid-CoT-Reasoning
View on GitHub
☆62Jun 7, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
StarDewXXX / O1-Pruner
View on GitHub
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆99Feb 21, 2025Updated last year
MiroMindAI / MiroRL
View on GitHub
MiroRL is an MCP-first reinforcement learning framework for deep research agent.
☆246Aug 27, 2025Updated 10 months ago
mzf666 / MATPO
View on GitHub
Official implementation of MATPO: Multi-Agent Tool-Integrated Policy Optimization.
☆82Oct 31, 2025Updated 8 months ago
qhjqhj00 / MemoBrain
View on GitHub
Executive Memory for Coherent Long-Horizon Reasoning!
☆84Jan 14, 2026Updated 6 months ago
HKUDS / DeepResearch-Eval
View on GitHub
"DeepResearch-Eval: An End-to-End Evaluation Framework for DeepResearch Systems"
☆49Oct 16, 2025Updated 9 months ago
HJYao00 / R1-ShareVL
View on GitHub
[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward
☆38Sep 19, 2025Updated 10 months ago
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
KongYilun / M3DT
View on GitHub
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
☆22Sep 18, 2025Updated 10 months ago
Ayanami0730 / deep_research_bench
View on GitHub
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆793May 11, 2026Updated 2 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
git-disl / Safety-Tax
View on GitHub
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆35Mar 11, 2025Updated last year
RUC-NLPIR / iAgent
View on GitHub
Including 12+ cutting-edge agent systems across multiple research directions
☆35Nov 10, 2025Updated 8 months ago
Algolzw / self-rewarding-smc
View on GitHub
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models
☆16Feb 17, 2026Updated 5 months ago
TIGER-AI-Lab / EditReward
View on GitHub
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]
☆155Apr 11, 2026Updated 3 months ago
usail-hkust / Agent-Omit
View on GitHub
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Reinforcement Learning
☆31May 11, 2026Updated 2 months ago
ApodexAI / AgentHarness
View on GitHub
Evaluation harness for Apodex-1.0 on public deep-research benchmarks.
☆364Jun 8, 2026Updated last month
FractalAIResearchLabs / Fathom-DeepResearch
View on GitHub
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval And Synthesis For SLMs
☆62Oct 7, 2025Updated 9 months ago
Xalp / MARS
View on GitHub
Official Implementation of MARS
☆30Apr 21, 2026Updated 3 months ago
SalesforceAIResearch / UserBench
View on GitHub
☆63Jun 2, 2026Updated last month
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
texttron / BrowseComp-Plus
View on GitHub
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent (ACL 2026 Main)
☆318May 28, 2026Updated last month
EvolvingLMMs-Lab / multimodal-sae
View on GitHub
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆199Sep 26, 2025Updated 9 months ago
KemingWu / HybridLayout
View on GitHub
[ICCV 2025] Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics.
☆20Oct 23, 2025Updated 8 months ago
scaleapi / researchrubrics
View on GitHub
Code repository for ICLR 2026 paper "ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents" (https://ww…
☆27Feb 10, 2026Updated 5 months ago
RorschachChen / entangled-watermark-torch
View on GitHub
☆18Nov 13, 2021Updated 4 years ago
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
lmarena / search-arena
View on GitHub
⚔️ [ICLR 2026] Official code of "Search Arena: Analyzing Search-Augmented LLMs".
☆58Feb 23, 2026Updated 4 months ago