resistzzz / Co-rewardingLinks
[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
☆30Updated last month
Alternatives and similar repositories for Co-rewarding
Users that are interested in Co-rewarding are comparing it to the libraries listed below
Sorting:
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Updated 7 months ago
- [NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method☆84Updated this week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆87Updated 9 months ago
- [arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆44Updated last month
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆98Updated 2 months ago
- ☆25Updated 4 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆82Updated 5 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆70Updated last year
- (ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆44Updated 4 months ago
- dParallel: Learnable Parallel Decoding for dLLMs☆42Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆81Updated last month
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆82Updated 2 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Updated last year
- A Sober Look at Language Model Reasoning☆89Updated last week
- Doodling our way to AGI ✏️ 🖼️ 🧠☆113Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆142Updated 4 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆46Updated last year
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆88Updated 11 months ago
- ☆21Updated 7 months ago
- ☆32Updated 6 months ago
- ☆30Updated last week
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆40Updated 6 months ago
- One-shot Entropy Minimization☆187Updated 5 months ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆161Updated 5 months ago
- ☆102Updated 10 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆21Updated 7 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆83Updated last year
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆36Updated 5 months ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆142Updated last month
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Updated 9 months ago