zhyang2226 / AR-LoptiLinks
[arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆32Updated last month
Alternatives and similar repositories for AR-Lopti
Users that are interested in AR-Lopti are comparing it to the libraries listed below
Sorting:
- ☆46Updated 2 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆75Updated 4 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆38Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆64Updated last month
- ☆139Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 10 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆61Updated 6 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆210Updated this week
- A Sober Look at Language Model Reasoning☆74Updated 2 weeks ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆72Updated 3 weeks ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆125Updated last month
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆48Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- A repo for open research on building large reasoning models☆50Updated this week
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆41Updated last year
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆62Updated 3 weeks ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆124Updated 2 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆34Updated 2 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆37Updated 2 weeks ago
- A Collection of Papers on Diffusion Language Models☆82Updated last week
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated this week
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆28Updated last month
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆29Updated 2 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated 2 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆54Updated 7 months ago
- ☆80Updated 5 months ago
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆19Updated 3 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated last month
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆46Updated 3 weeks ago