QiWang98 / VideoRFTLinks
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
☆32Updated 3 weeks ago
Alternatives and similar repositories for VideoRFT
Users that are interested in VideoRFT are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Official repository of "GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning".☆29Updated 7 months ago
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models☆13Updated 2 months ago
- ☆64Updated 3 weeks ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆38Updated 2 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated last month
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆15Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆60Updated this week
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆16Updated 3 months ago
- ☆33Updated last week
- Benchmarking Multi-Image Understanding in Vision and Language Models☆11Updated 11 months ago
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆26Updated 9 months ago
- ☆11Updated 3 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆23Updated 3 months ago
- ☆16Updated 2 months ago
- Official Repository of Personalized Visual Instruct Tuning☆31Updated 4 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆22Updated last week
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆59Updated 4 months ago
- [TCSVT 2024] Temporally Consistent Referring Video Object Segmentation with Hybrid Memory☆17Updated 3 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆32Updated last month
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- ☆34Updated 3 weeks ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆77Updated 8 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆116Updated last month
- Official implementation of MC-LLaVA.☆32Updated last month
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆27Updated 8 months ago
- ☆10Updated last year
- [ECCV 2024] Official repository for "DataDream: Few-shot Guided Dataset Generation"☆40Updated 11 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆37Updated 5 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆40Updated 4 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year