A comprehensive benchmark for evaluating deep research agents on academic survey tasks
β51Sep 4, 2025Updated 8 months ago
Alternatives and similar repositories for ReportBench
Users that are interested in ReportBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π LLM-I: Transform LLMs into natural interleaved multimodal creators! β¨ Tool-use framework supporting image search, generation, code exβ¦β41Oct 20, 2025Updated 7 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.β43Oct 31, 2025Updated 6 months ago
- β25Dec 13, 2024Updated last year
- β17May 31, 2023Updated 2 years ago
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.β21Apr 3, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022β15Mar 31, 2023Updated 3 years ago
- AI for Mathematics Paper Listβ17Jan 14, 2025Updated last year
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".β37Dec 6, 2025Updated 5 months ago
- β152May 14, 2025Updated last year
- A holistic framework for advancing LLMs as data science agentsβ40Feb 3, 2026Updated 3 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignmentβ17Dec 19, 2024Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learningβ32Jan 25, 2026Updated 3 months ago
- This repository contains the code for the paper βNeuro-Symbolic Query Compilerβ, accepted to the Findings of ACL 2025.β17Oct 20, 2025Updated 7 months ago
- Llemma formal2formal (tactic prediction) theorem proving experimentsβ20Oct 17, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code of CIKM 2023 (Oral Presentation) : A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NEβ¦β14Jul 19, 2024Updated last year
- β17Jul 12, 2025Updated 10 months ago
- The code and data for the paper JiuZhang3.0β49May 26, 2024Updated last year
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Modelsβ24Jan 6, 2026Updated 4 months ago
- An Interpretable Neuro-Symbolic Framework for Task-Oriented Dialogue Generationβ23Mar 6, 2022Updated 4 years ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinismβ30Jul 17, 2024Updated last year
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimizaβ¦β20Nov 21, 2024Updated last year
- Code for paper: Long cOntext aliGnment via efficient preference Optimizationβ24Oct 10, 2025Updated 7 months ago
- An (incomplete) overview of information extractionβ43Apr 28, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Synthesizing realistic and diverse text-datasets from augmented LLMsβ19Apr 4, 2026Updated last month
- [AAAI'26, Oral π] Code for "Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Leaβ¦β45Jul 16, 2025Updated 10 months ago
- MATCH-TUNINGβ15Aug 6, 2022Updated 3 years ago
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large β¦β25May 29, 2024Updated last year
- β46Jun 11, 2025Updated 11 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLPβ24)β27Oct 3, 2025Updated 7 months ago
- β15Apr 6, 2026Updated last month
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimizationβ82Dec 25, 2025Updated 4 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"β15Jul 24, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]β45Aug 25, 2025Updated 8 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"β18Oct 1, 2024Updated last year
- An Experiment on Dynamic NTK Scaling RoPEβ64Nov 26, 2023Updated 2 years ago
- β83May 28, 2025Updated 11 months ago
- β48Aug 5, 2025Updated 9 months ago
- [AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Modelsβ26Sep 24, 2025Updated 7 months ago
- β19Nov 4, 2025Updated 6 months ago