[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆17Feb 9, 2026Updated last month
Alternatives and similar repositories for model-task-align-rl
Users that are interested in model-task-align-rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆24Jul 1, 2025Updated 8 months ago
- ☆44Dec 16, 2025Updated 3 months ago
- R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning☆34Feb 9, 2026Updated last month
- Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.☆87Updated this week
- AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆95Mar 12, 2026Updated last week
- ROS2 Bag file parsing☆10Mar 14, 2020Updated 6 years ago
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆32Feb 26, 2026Updated 3 weeks ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- ☆34Aug 7, 2025Updated 7 months ago
- ☆25Aug 19, 2025Updated 7 months ago
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆35Aug 28, 2025Updated 6 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆29Aug 19, 2025Updated 7 months ago
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- inductive reasoning benchmark with subregular hierarchy for string-to-string transformation☆16Jun 27, 2025Updated 8 months ago
- Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"☆11Oct 11, 2024Updated last year
- Generative Modeling with Bayesian Sample Inference☆24May 17, 2025Updated 10 months ago
- ☆28Feb 15, 2026Updated last month
- ☆13Jul 14, 2024Updated last year
- [ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆63May 22, 2025Updated 10 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆21Jan 11, 2026Updated 2 months ago
- Dive-into-LLMs Tutorial for Beginners☆12May 14, 2024Updated last year
- 🧌 Live2d models for cnblog themes.☆13Apr 3, 2022Updated 3 years ago
- A Model Context Protocol (MCP) server for Google Calendar integration in Cluade Desktop with auto authentication support. This server ena…☆13Mar 11, 2025Updated last year
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆31Oct 9, 2025Updated 5 months ago
- Resa: Transparent Reasoning Models via SAEs☆48Sep 23, 2025Updated 6 months ago
- ☆12Nov 21, 2023Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆26Mar 9, 2026Updated 2 weeks ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆53Jul 15, 2025Updated 8 months ago
- Some Pwn Challenges from winesap.☆14Aug 15, 2019Updated 6 years ago
- Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"☆87May 27, 2025Updated 9 months ago
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- ☆13Dec 12, 2025Updated 3 months ago
- [NeurIPS 2024] Image Understanding Makes for A Good Tokenizer for Image Generation☆22Dec 17, 2024Updated last year
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆28Nov 1, 2025Updated 4 months ago
- Competitive Programming Code Template☆11Nov 6, 2022Updated 3 years ago
- [NeurIPS 2024 Spotlight] CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning.☆14Dec 12, 2024Updated last year