YutingLi0606 / Vision-MattersLinks
(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
☆37Updated 2 weeks ago
Alternatives and similar repositories for Vision-Matters
Users that are interested in Vision-Matters are comparing it to the libraries listed below
Sorting:
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆22Updated 3 weeks ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆69Updated 3 weeks ago
- ☆80Updated 5 months ago
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- Official implement of MIA-DPO☆58Updated 5 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆23Updated 2 weeks ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆47Updated 3 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆30Updated 5 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆111Updated 3 months ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆14Updated this week
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 9 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆16Updated last month
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆15Updated 2 months ago
- ☆49Updated last month
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆41Updated 2 weeks ago
- Code release for VTW (AAAI 2025) Oral☆43Updated 5 months ago
- Official implementation of MC-LLaVA.☆28Updated 3 weeks ago
- ☆86Updated 3 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆28Updated last month
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆55Updated this week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆66Updated 11 months ago
- ☆25Updated last year
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆62Updated 3 weeks ago
- ☆14Updated last month
- ☆42Updated 7 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆73Updated 2 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆29Updated last month
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆50Updated 6 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆119Updated 3 weeks ago