A Simple Framework of Small-scale LMMs for Video Understanding
☆108Jun 11, 2025Updated 8 months ago
Alternatives and similar repositories for TinyLLaVA-Video
Users that are interested in TinyLLaVA-Video are comparing it to the libraries listed below
Sorting:
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated 2 months ago
- A Framework of Small-scale Large Multimodal Models☆963Feb 7, 2026Updated last month
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆82Jul 4, 2025Updated 8 months ago
- ☆41Sep 9, 2025Updated 5 months ago
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆385Jun 13, 2025Updated 8 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆92Aug 8, 2025Updated 6 months ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- A Framework for Collaboration of Experts from Benchmark☆13Apr 27, 2025Updated 10 months ago
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 5 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆33May 27, 2025Updated 9 months ago
- Frontier Multimodal Foundation Models for Image and Video Understanding☆1,109Aug 14, 2025Updated 6 months ago
- Official implementation for "Diffusion Instruction Tuning"☆31Jun 10, 2025Updated 8 months ago
- ☆98Jun 23, 2025Updated 8 months ago
- A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long…☆18Sep 12, 2025Updated 5 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Jun 10, 2025Updated 8 months ago
- Agently Stage - Efficient Convenient Asynchronous & Multithreaded Programming☆13Apr 2, 2025Updated 11 months ago
- DeepTrace: A lightweight, scalable real-time diagnostic and analysis tool for distributed training tasks.☆18Nov 4, 2025Updated 4 months ago
- ☆107Jun 10, 2025Updated 8 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆271Jan 20, 2026Updated last month
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆194Mar 17, 2025Updated 11 months ago
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆260Oct 18, 2025Updated 4 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Sh…☆19Dec 30, 2025Updated 2 months ago
- ☆18Nov 10, 2024Updated last year
- ☆118May 26, 2025Updated 9 months ago
- Official repo and evaluation implementation of VSI-Bench☆675Aug 5, 2025Updated 7 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Oct 9, 2025Updated 4 months ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆42Nov 30, 2025Updated 3 months ago
- ☆37Sep 16, 2024Updated last year
- Official implement of MIA-DPO☆71Jan 23, 2025Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated 11 months ago
- ☆49Apr 11, 2025Updated 10 months ago
- DPO, but faster 🚀☆48Dec 6, 2024Updated last year
- ☆85Apr 21, 2025Updated 10 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆22Feb 23, 2025Updated last year
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆126Jan 24, 2026Updated last month
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 9 months ago