A comprehensive benchmark for evaluating deep research agents on academic survey tasks
β51Sep 4, 2025Updated 7 months ago
Alternatives and similar repositories for ReportBench
Users that are interested in ReportBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π LLM-I: Transform LLMs into natural interleaved multimodal creators! β¨ Tool-use framework supporting image search, generation, code exβ¦β40Oct 20, 2025Updated 5 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.β43Oct 31, 2025Updated 5 months ago
- β25Dec 13, 2024Updated last year
- β17May 31, 2023Updated 2 years ago
- A holistic framework for advancing LLMs as data science agentsβ40Feb 3, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.β21Apr 3, 2025Updated last year
- Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022β15Mar 31, 2023Updated 3 years ago
- β151May 14, 2025Updated 10 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".β35Dec 6, 2025Updated 4 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.β25Oct 7, 2025Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignmentβ16Dec 19, 2024Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learningβ32Jan 25, 2026Updated 2 months ago
- This repository contains the code for the paper βNeuro-Symbolic Query Compilerβ, accepted to the Findings of ACL 2025.β16Oct 20, 2025Updated 5 months ago
- Llemma formal2formal (tactic prediction) theorem proving experimentsβ20Oct 17, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β17Jul 12, 2025Updated 8 months ago
- β14Dec 18, 2024Updated last year
- The code and data for the paper JiuZhang3.0β49May 26, 2024Updated last year
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Modelsβ24Jan 6, 2026Updated 3 months ago
- An Interpretable Neuro-Symbolic Framework for Task-Oriented Dialogue Generationβ23Mar 6, 2022Updated 4 years ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinismβ30Jul 17, 2024Updated last year
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimizaβ¦β20Nov 21, 2024Updated last year
- Code for paper: Long cOntext aliGnment via efficient preference Optimizationβ24Oct 10, 2025Updated 6 months ago
- An (incomplete) overview of information extractionβ43Apr 28, 2022Updated 3 years ago
- NordVPN Special Discount Offer β’ AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Synthesizing realistic and diverse text-datasets from augmented LLMsβ18Updated this week
- [AAAI'26, Oral π] Code for "Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Leaβ¦β43Jul 16, 2025Updated 8 months ago
- MATCH-TUNINGβ15Aug 6, 2022Updated 3 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β102Feb 20, 2025Updated last year
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large β¦β25May 29, 2024Updated last year
- OmniGAIA: Towards Native Omni-Modal AI Agentsβ89Apr 2, 2026Updated last week
- β46Jun 11, 2025Updated 9 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLPβ24)β27Oct 3, 2025Updated 6 months ago
- β15Jan 12, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimizationβ82Dec 25, 2025Updated 3 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"β15Jul 24, 2024Updated last year
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]β45Aug 25, 2025Updated 7 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"β18Oct 1, 2024Updated last year
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"β22Feb 17, 2025Updated last year
- An Experiment on Dynamic NTK Scaling RoPEβ64Nov 26, 2023Updated 2 years ago
- β58Mar 16, 2026Updated 3 weeks ago