CR400AF-A / SparseMMLinks
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
β76Updated last week
Alternatives and similar repositories for SparseMM
Users that are interested in SparseMM are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cacheβ42Updated last year
- π [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'2β¦β88Updated 4 months ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Modelsβ98Updated last year
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generatorβ114Updated 7 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Modelsβ180Updated 11 months ago
- β¨β¨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracyβ301Updated 5 months ago
- [NAACL 2025 Oral] π From redundancy to relevance: Enhancing explainability in multimodal large language modelsβ120Updated 8 months ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.β73Updated 4 months ago
- Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".β72Updated 4 months ago
- RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Responseβ42Updated 10 months ago
- [ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPOβ35Updated 3 months ago
- Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"β198Updated 2 months ago
- [ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"β177Updated last month
- SemiEvol: Semi-supervised Fine-tuning for LLM Adaptationβ55Updated 6 months ago
- A Gaussian dense reward framework for GUI grounding trainingβ228Updated 2 months ago
- A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AIβ188Updated 2 months ago
- Your efficient and accurate answer verification system for RL training.β41Updated 4 months ago
- [Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Surveyβ319Updated 3 weeks ago
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Modelβ134Updated 6 months ago
- Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]β117Updated last year
- β131Updated 9 months ago
- [ICLR 2025] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilitiesβ145Updated 9 months ago
- (NeurIPS 2024) Official PyTorch implementation of LOVA3β90Updated 7 months ago
- Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Modelsβ158Updated 10 months ago
- [NeurIPS 2025] Hybrid Latent Reasoning via Reinforcement Learningβ155Updated last month
- WorldGPT: Empowering LLM as Multimodal World Modelβ116Updated last year
- [NeurIPS 2025] Efficient Reasoning Vision Language Modelsβ407Updated last month
- β69Updated 7 months ago
- CoS: Chain-of-Shot Prompting for Long Video Understandingβ51Updated 8 months ago
- [BMVC 2025] Official implementation for paper EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding andβ¦β105Updated 2 months ago