John-AI-Lab / NoisyRollout

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

☆53

Alternatives and similar repositories for NoisyRollout:

Users that are interested in NoisyRollout are comparing it to the libraries listed below

RifleZhang / LLaVA-Reasoner-DPO
☆75Updated 4 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆84Updated this week
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆47Updated 4 months ago
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆77Updated 3 months ago
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆104Updated 2 weeks ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆100Updated 2 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆73Updated 11 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆65Updated 11 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆73Updated 6 months ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆79Updated 3 months ago
yaolinli / TimeChat-Online
TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆13Updated this week
YuxiXie / V-DPO
Preference Learning for LLaVA
☆44Updated 6 months ago
si0wang / VisVM
☆40Updated 4 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆56Updated 7 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆52Updated last month
ShadeCloak / ADORA
☆43Updated last month
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆57Updated 3 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆94Updated 2 months ago
yfzhang114 / r1_reward
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆63Updated this week
pengts / VW-LMM
☆25Updated 11 months ago
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆32Updated 3 weeks ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆47Updated 2 months ago
vlf-silkie / VLFeedback
☆99Updated last year
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆55Updated 9 months ago
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆32Updated 7 months ago
haonan3 / V1
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆34Updated 3 weeks ago
mlfoundations / VisIT-Bench
☆51Updated last year
TIGER-AI-Lab / VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"
☆24Updated this week
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆46Updated 6 months ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆55Updated 8 months ago