ByteDance-BandAI / ReportBenchLinks
A comprehensive benchmark for evaluating deep research agents on academic survey tasks
☆32Updated last month
Alternatives and similar repositories for ReportBench
Users that are interested in ReportBench are comparing it to the libraries listed below
Sorting:
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 10 months ago
 - Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆51Updated last year
 - Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆26Updated last month
 - Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
 - [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
 - instruction-following benchmark for large reasoning models☆45Updated 2 months ago
 - The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated last year
 - ☆19Updated 10 months ago
 - Large Language Models Can Self-Improve in Long-context Reasoning☆73Updated 11 months ago
 - ☆58Updated last year
 - ☆63Updated 4 months ago
 - A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆123Updated last week
 - From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆23Updated 3 weeks ago
 - A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆57Updated 4 months ago
 - ☆35Updated last year
 - The code and data for the paper JiuZhang3.0☆49Updated last year
 - The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"☆81Updated last month
 - WideSearch: Benchmarking Agentic Broad Info-Seeking☆98Updated 3 weeks ago
 - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆109Updated 5 months ago
 - [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 4 months ago
 - ☆30Updated 10 months ago
 - Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75Updated 5 months ago
 - [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆42Updated 8 months ago
 - BeHonest: Benchmarking Honesty in Large Language Models☆34Updated last year
 - Towards Systematic Measurement for Long Text Quality☆36Updated last year
 - [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆114Updated 5 months ago
 - Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆28Updated last year
 - [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆82Updated 9 months ago
 - ☆36Updated 3 months ago
 - The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year