MengLcool / SliMM
☆18Updated last month
Alternatives and similar repositories for SliMM:
Users that are interested in SliMM are comparing it to the libraries listed below
- ☆17Updated last month
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆123Updated 2 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆133Updated 2 months ago
- ☆132Updated last year
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆253Updated last month
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆86Updated 2 months ago
- Official repository of MMDU dataset☆83Updated 4 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆117Updated last month
- Liquid: Language Models are Scalable Multi-modal Generators☆65Updated 2 months ago
- ☆65Updated 2 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆63Updated 5 months ago
- A collection of visual instruction tuning datasets.☆76Updated 11 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆45Updated 7 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆51Updated last month
- ☆65Updated 3 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆37Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆53Updated last month
- The official implementation of RAR☆81Updated 10 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆70Updated 3 weeks ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆156Updated 4 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆29Updated this week
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆88Updated last month
- ☆114Updated 8 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆92Updated last week
- ☆36Updated last month
- ☆42Updated 2 months ago
- ☆33Updated 7 months ago