VILA-Lab / DELTLinks
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA top 1-acc by +1.3% and increases diversity per class by +5%
☆24Updated 2 months ago
Alternatives and similar repositories for DELT
Users that are interested in DELT are comparing it to the libraries listed below
Sorting:
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆78Updated 5 months ago
- Data distillation benchmark☆71Updated 5 months ago
- Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning☆52Updated last month
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆109Updated 4 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆211Updated 10 months ago
- Matryoshka Multimodal Models☆115Updated 9 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆51Updated 2 months ago
- ☆53Updated 10 months ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆155Updated last year
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆81Updated last month
- ☆105Updated 5 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆50Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆58Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆119Updated 3 months ago
- Adapting LLaMA Decoder to Vision Transformer☆30Updated last year
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Updated 10 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆23Updated 6 months ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆143Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆32Updated 8 months ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆139Updated last month
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆74Updated 7 months ago
- ☆27Updated last year
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆50Updated 6 months ago
- An open source implementation of CLIP (With TULIP Support)☆163Updated 6 months ago
- ☆30Updated 3 weeks ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆106Updated last year
- ☆123Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆59Updated last year
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)☆53Updated 10 months ago