cxcscmu / deepresearch_benchmarkingLinks
☆23Updated 5 months ago
Alternatives and similar repositories for deepresearch_benchmarking
Users that are interested in deepresearch_benchmarking are comparing it to the libraries listed below
Sorting:
- ☆69Updated 6 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Updated 5 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 4 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆55Updated 2 months ago
- ☆54Updated 3 months ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆20Updated 3 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆124Updated 9 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆24Updated 3 months ago
- ☆64Updated 2 months ago
- ☆19Updated 8 months ago
- ☆47Updated 3 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆145Updated 3 months ago
- ☆45Updated 3 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆25Updated 4 months ago
- A repo for open research on building large reasoning models☆126Updated last week
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- ☆34Updated 7 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆55Updated this week
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 2 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120Updated 8 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 8 months ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆49Updated 2 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated last year
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆72Updated 8 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 10 months ago
- ☆21Updated last year
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- ☆33Updated last month
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆130Updated 2 months ago
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆49Updated 2 weeks ago