[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆15Feb 9, 2026Updated 3 weeks ago
Alternatives and similar repositories for model-task-align-rl
Users that are interested in model-task-align-rl are comparing it to the libraries listed below
Sorting:
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- ☆25Aug 19, 2025Updated 6 months ago
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 8 months ago
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆31Updated this week
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆48Dec 25, 2025Updated 2 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Oct 9, 2025Updated 4 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆52Jul 15, 2025Updated 7 months ago
- ☆31Aug 7, 2025Updated 6 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 6 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆35Aug 28, 2025Updated 6 months ago
- a survey on deep research☆47Sep 9, 2025Updated 5 months ago
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆28May 14, 2025Updated 9 months ago
- Generative Modeling with Bayesian Sample Inference☆24May 17, 2025Updated 9 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- [ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆62May 22, 2025Updated 9 months ago
- ☆42Dec 16, 2025Updated 2 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆29May 22, 2025Updated 9 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆59Feb 22, 2026Updated last week
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- DCPO: Dynamic Adaptive Clipping for RL☆46Dec 20, 2025Updated 2 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆59Jan 5, 2026Updated last month
- ☆60Jan 12, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 24, 2026Updated last week
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 4 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Aug 7, 2025Updated 6 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 2 months ago
- Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"☆86May 27, 2025Updated 9 months ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆78Feb 10, 2026Updated 3 weeks ago
- instruction-following benchmark for large reasoning models☆44Aug 9, 2025Updated 6 months ago
- ☆155Nov 24, 2025Updated 3 months ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 4 months ago