Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆43Updated 3 weeks ago
Alternatives and similar repositories for FasterVLM:
Users that are interested in FasterVLM are comparing it to the libraries listed below
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆66Updated 2 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆50Updated this week
- Code release for VTW (AAAI 2025)☆27Updated last month
- ☆31Updated last month
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆17Updated this week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆109Updated 7 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆58Updated last week
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆15Updated 3 weeks ago
- [NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks☆101Updated last month
- ☆61Updated 2 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated 3 weeks ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆89Updated 6 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆21Updated 2 months ago
- ☆107Updated 5 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆160Updated 3 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆40Updated last week
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆88Updated last month
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 3 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆59Updated 4 months ago
- ✈️ Accelerating Vision Diffusion Transformers with Skip Branches.☆58Updated 3 weeks ago
- The official implementation of RAR☆78Updated 9 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆44Updated 2 months ago
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆86Updated 2 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆152Updated 2 months ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆194Updated 2 months ago
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆28Updated 2 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆64Updated 2 months ago
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆36Updated 4 months ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆76Updated this week
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆33Updated 2 months ago