L-O-I / RRVFLinks
☆18Updated 5 months ago
Alternatives and similar repositories for RRVF
Users that are interested in RRVF are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 7 months ago
- ☆132Updated 9 months ago
- ☆57Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆85Updated 5 months ago
- ☆96Updated 6 months ago
- Official implement of MIA-DPO☆70Updated 11 months ago
- ☆80Updated 6 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆35Updated 5 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆87Updated 5 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 5 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 4 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 4 months ago
- ☆33Updated last month
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆50Updated 9 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆137Updated 7 months ago
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆98Updated 5 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆110Updated 2 weeks ago
- A Simple Framework of Small-scale LMMs for Video Understanding☆107Updated 7 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- SFT+RL boosts multimodal reasoning☆41Updated 6 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆134Updated last week
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"☆172Updated this week
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆152Updated 4 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆112Updated 2 months ago
- ☆21Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆133Updated 9 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆113Updated 5 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆76Updated last year
- ☆38Updated 2 months ago