hkust-nlp/model-task-align-rl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkust-nlp/model-task-align-rl)

hkust-nlp / model-task-align-rl

[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".

☆18

Alternatives and similar repositories for model-task-align-rl

Users that are interested in model-task-align-rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hkust-nlp / RL-Verifier-Robustness
View on GitHub
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆24Oct 7, 2025Updated 9 months ago
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 5 months ago
hkust-nlp / deepsearch-tts
View on GitHub
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
☆21Oct 8, 2025Updated 9 months ago
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated 11 months ago
hkust-nlp / LOCA-bench
View on GitHub
Benchmarking Language Agents Under Controllable and Extreme Context Growth
☆50Apr 29, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
hkust-nlp / WebExplorer
View on GitHub
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆120Sep 29, 2025Updated 9 months ago
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆432Updated this week
TIGER-AI-Lab / One-Shot-CFT
View on GitHub
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]
☆33Sep 1, 2025Updated 10 months ago
hkust-nlp / Laser
View on GitHub
[ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆66May 22, 2025Updated last year
zjunlp / ReCode
View on GitHub
[AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates
☆25Jul 1, 2025Updated last year
Yibin-Lei / MetaEOL
View on GitHub
Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"
☆12Jul 25, 2024Updated last year
koalazf99 / nanoverl
View on GitHub
Collections of RLxLM experiments using minimal codes
☆14Feb 17, 2025Updated last year
hkust-nlp / mstar
View on GitHub
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆75Jul 13, 2025Updated last year
Zhiyuan-Zeng / RLVE
View on GitHub
[ICML 2026] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
☆225Apr 30, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cocoa-org / NanoRollout
View on GitHub
Scale digital agent rollouts without pain.
☆34Jun 18, 2026Updated last month
hkust-nlp / KernelGYM
View on GitHub
[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations [ICML…
☆194Mar 29, 2026Updated 3 months ago
martenlienen / bsi
View on GitHub
Generative Modeling with Bayesian Sample Inference
☆24May 17, 2025Updated last year
JingMog / THOR
View on GitHub
[ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".
☆33Feb 26, 2026Updated 4 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
hkust-nlp / AgentVista
View on GitHub
Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.
☆65Mar 10, 2026Updated 4 months ago
MajorDavidZhang / Generalization_unified_VLM
View on GitHub
☆24May 23, 2025Updated last year
hkust-nlp / llm-compression-intelligence
View on GitHub
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆150Sep 20, 2024Updated last year
junfeng0288 / MathReal
View on GitHub
☆15Aug 11, 2025Updated 11 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
VickiCui / MORE
View on GitHub
Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"
☆11Oct 11, 2024Updated last year
GongRzhe / Calendar-Autoauth-MCP-Server
View on GitHub
A Model Context Protocol (MCP) server for Google Calendar integration in Cluade Desktop with auto authentication support. This server ena…
☆12Mar 11, 2025Updated last year
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
yongchao98 / R1-Code-Interpreter
View on GitHub
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
☆44Feb 9, 2026Updated 5 months ago
CraftJarvis / ROCKET-3
View on GitHub
Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents
☆17Aug 19, 2025Updated 11 months ago
MoonshotAI / CombiBench
View on GitHub
☆52Jun 15, 2026Updated last month
gyhdog99 / RACRO2
View on GitHub
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆19Jul 1, 2025Updated last year
DerrickYLJ / LessIsMore
View on GitHub
[ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
☆34Sep 12, 2025Updated 10 months ago
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
tianyi-lab / R2-T2
View on GitHub
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆19Mar 10, 2025Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
Kwai-Klear / CE-GPPO
View on GitHub
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
☆16Jan 23, 2026Updated 5 months ago
JmlrOrg / dmlr-style-file
View on GitHub
☆12Nov 21, 2023Updated 2 years ago
davidbrandfonbrener / color-filter-olmo
View on GitHub
☆13Dec 12, 2025Updated 7 months ago
0xbbdd / Some-Pwn-Challenges-from-winesap
View on GitHub
Some Pwn Challenges from winesap.
☆14Aug 15, 2019Updated 6 years ago
thu-coai / Stylized-Story-Generation-with-Style-Guided-Planning
View on GitHub
Codes for paper "Stylized Story Generation with Style-Guided Planning"
☆12May 9, 2021Updated 5 years ago