resistzzz / Co-RewardLinks

Co-Reward: Self-supervised RL for LLM Reasoning via Contrastive Agreement

☆29

Alternatives and similar repositories for Co-Reward

Users that are interested in Co-Reward are comparing it to the libraries listed below

Sorting:

GAIR-NLP / thinking-with-generated-images
Doodling our way to AGI ✏️ 🖼️ 🧠
☆103Updated 3 months ago
QingyangZhang / EMPO
EMPO, A Fully Unsupervised RLVR Method
☆66Updated this week
haonan3 / V1
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Updated 5 months ago
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆84Updated 7 months ago
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆133Updated 5 months ago
RayRuiboChen / Self-Filter
☆25Updated 2 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
RainBowLuoCS / DEEM
(ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.
☆39Updated 2 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated last year
ShadeCloak / ADORA
☆46Updated 5 months ago
beccabai / Data-centric_multimodal_LLM
Survey on Data-centric Large Language Models
☆84Updated last year
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 7 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆74Updated this week
SUSTechBruce / SRPO_MLLMs
[NeurIPS 2025🔥]Main source code of SRPO framework.
☆83Updated this week
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆91Updated last week
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆48Updated last year
GaryStack / MMR-V
Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?
☆36Updated 3 months ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
shikiw / Modality-Integration-Rate
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…
☆106Updated 2 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆81Updated 10 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆86Updated 7 months ago
zjunlp / Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆109Updated 2 weeks ago
si0wang / VisVM
☆45Updated 8 months ago
tmlr-group / Co-rewarding
[arXiv:2508.00410] "Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement"
☆37Updated last month
MME-Benchmarks / MME-CoT
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆129Updated last month
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆85Updated last month
vlm2-bench / VLM2-Bench
VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
☆42Updated 4 months ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 8 months ago
Chongjie-Si / Subspace-Tuning
A generalized framework for subspace tuning methods in parameter efficient fine-tuning.
☆155Updated 3 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆90Updated 8 months ago