lcqysl / FrameThinker-RLLinks
☆31Updated 2 months ago
Alternatives and similar repositories for FrameThinker-RL
Users that are interested in FrameThinker-RL are comparing it to the libraries listed below
Sorting:
- ☆35Updated last year
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆102Updated 3 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆53Updated 9 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆48Updated 2 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆20Updated last year
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆75Updated 5 months ago
- ☆21Updated 8 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆59Updated 6 months ago
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning☆23Updated this week
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 6 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆51Updated last year
- ☆32Updated last year
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆19Updated 5 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Updated 5 months ago
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆61Updated last week
- ☆46Updated 11 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 2 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Updated 2 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆45Updated last year
- ☆18Updated last year
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".☆38Updated last year
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆108Updated 2 weeks ago
- Official repository for CoMM Dataset☆48Updated 11 months ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆63Updated 7 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Updated 6 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆37Updated last week
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆51Updated 5 months ago
- MR. Video: MapReduce is the Principle for Long Video Understanding☆28Updated 8 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Updated 8 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆86Updated 3 months ago