NVlabs / NFTLinks
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"
☆27Updated last month
Alternatives and similar repositories for NFT
Users that are interested in NFT are comparing it to the libraries listed below
Sorting:
- ☆48Updated last month
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆39Updated last week
- ☆75Updated last week
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆103Updated last week
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 4 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 5 months ago
- A repo for open research on building large reasoning models☆71Updated this week
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆72Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆112Updated 3 months ago
- The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"☆39Updated this week
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆47Updated 2 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 3 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆67Updated 2 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆52Updated 7 months ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆82Updated 3 weeks ago
- ☆174Updated 3 weeks ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆28Updated last month
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆113Updated 3 weeks ago
- Multimodal RewardBench☆42Updated 4 months ago
- ☆46Updated 2 months ago
- Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆77Updated 2 weeks ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆89Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆64Updated last month
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆71Updated last week
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- [Preprint 2025] Thinkless: LLM Learns When to Think☆201Updated 3 weeks ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆99Updated last week
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆24Updated last week
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆152Updated last week