MengLcool / SliMMLinks
☆21Updated 10 months ago
Alternatives and similar repositories for SliMM
Users that are interested in SliMM are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆106Updated 3 weeks ago
 - Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆32Updated 7 months ago
 - Visual Instruction Tuning for Qwen2 Base Model☆39Updated last year
 - [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆53Updated 7 months ago
 - [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆136Updated 5 months ago
 - A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆100Updated 11 months ago
 - ☆125Updated 7 months ago
 - [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆55Updated 5 months ago
 - ☆133Updated last year
 - Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…☆97Updated 2 months ago
 - ☆21Updated 9 months ago
 - DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆157Updated 10 months ago
 - [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆171Updated 3 months ago
 - Official repository for CoMM Dataset☆48Updated 10 months ago
 - The official implementation of RAR☆92Updated last year
 - What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆24Updated 5 months ago
 - ☆80Updated 11 months ago
 - Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆94Updated 9 months ago
 - 【NeurIPS 2024】Dense Connector for MLLMs☆179Updated last year
 - Evaluation code for Ref-L4, a new REC benchmark in the LMM era☆49Updated 10 months ago
 - Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.☆240Updated 2 months ago
 - ☆118Updated last year
 - [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆132Updated 2 months ago
 - [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆205Updated 7 months ago
 - official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆67Updated last year
 - ☆119Updated last year
 - Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆55Updated 4 months ago
 - R1-Vision: Let's first take a look at the image☆48Updated 8 months ago
 - [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆19Updated last year
 - [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆218Updated 7 months ago