hulianyuyy / iLLaVALinks
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
☆21Updated last year
Alternatives and similar repositories for iLLaVA
Users that are interested in iLLaVA are comparing it to the libraries listed below
Sorting:
- CLIP-MoE: Mixture of Experts for CLIP☆55Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Updated 9 months ago
- [ICCV 2025] Dynamic-VLM☆28Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆54Updated 3 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆23Updated 9 months ago
- ☆46Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆42Updated 3 months ago
- ☆54Updated last year
- ☆64Updated last week
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Updated 7 months ago
- ☆64Updated 2 months ago
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Updated last year
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆50Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Updated last year
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆23Updated 11 months ago
- [NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO☆78Updated 3 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆34Updated 10 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Updated last year
- Official implement of MIA-DPO☆70Updated last year
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Updated 11 months ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆47Updated last week
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 5 months ago
- Adapting LLaMA Decoder to Vision Transformer☆30Updated last year
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆38Updated 10 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Updated last year
- (ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆47Updated 7 months ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆66Updated 7 months ago