VILA-Lab / DELT
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA top 1-acc by +1.3% and increases diversity per class by +5%
☆19Updated 2 weeks ago
Alternatives and similar repositories for DELT:
Users that are interested in DELT are comparing it to the libraries listed below
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆19Updated 4 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆74Updated 5 months ago
- Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning☆44Updated 3 weeks ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Matryoshka Multimodal Models☆97Updated last month
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆57Updated 2 weeks ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 9 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆24Updated 5 months ago
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆53Updated 2 months ago
- Adapting LLaMA Decoder to Vision Transformer☆26Updated 9 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆48Updated 3 months ago
- ☆44Updated 10 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆72Updated last week
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆42Updated 2 months ago
- ☆42Updated last month
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆32Updated 3 months ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆141Updated 7 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆63Updated 4 months ago
- ☆38Updated 2 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆53Updated 10 months ago
- ☆111Updated 7 months ago
- ☆48Updated 2 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 2 months ago
- ☆39Updated 4 months ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆29Updated 4 months ago
- Data distillation benchmark☆57Updated 3 weeks ago
- Official Repository of Personalized Visual Instruct Tuning☆27Updated last week