ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆47Updated this week
Alternatives and similar repositories for TinyLLaVA-Video:
Users that are interested in TinyLLaVA-Video are comparing it to the libraries listed below
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 6 months ago
- ☆115Updated 8 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding☆41Updated 2 weeks ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆68Updated 6 months ago
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆54Updated this week
- LMM solved catastrophic forgetting, AAAI2025☆40Updated 5 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 7 months ago
- ☆73Updated last year
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆67Updated last month
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆132Updated 3 months ago
- LinVT: Empower Your Image-level Large Language Model to Understand Videos☆71Updated 3 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆155Updated 3 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆127Updated 2 weeks ago
- ☆72Updated 5 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆120Updated 5 months ago
- The Next Step Forward in Multimodal LLM Alignment☆145Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆159Updated 6 months ago
- ☆15Updated 3 weeks ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 10 months ago
- The official repository for the RealSyn dataset☆21Updated last month
- ☆82Updated 11 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆87Updated 3 weeks ago
- Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆155Updated last week
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 4 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 6 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆64Updated 7 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆69Updated 2 months ago
- Official project page of "HiMix: Reducing Computational Complexity in Large Vision-Language Models"☆10Updated 2 months ago