ChangyuanWang17 / QVLM
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
☆41Updated last month
Related projects ⓘ
Alternatives and complementary repositories for QVLM
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆55Updated last month
- This is the official pytorch implementation for the paper: Towards Accurate Post-training Quantization for Diffusion Models.(CVPR24 Poste…☆32Updated 5 months ago
- MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More☆20Updated last month
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆33Updated 2 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆31Updated 2 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆50Updated this week
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆44Updated 3 weeks ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆100Updated 6 months ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆141Updated 3 weeks ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆54Updated 2 months ago
- Implements VAR+CLIP for image generation☆78Updated 3 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- Empowering Unified MLLM with Multi-granular Visual Generation☆106Updated last month
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆53Updated last month
- ☆23Updated 3 months ago
- [NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks☆31Updated this week
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆42Updated 4 months ago
- DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆113Updated 5 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆98Updated 6 months ago
- [TPAMI2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation☆17Updated 2 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆57Updated last month
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆118Updated 10 months ago
- [CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D De…☆86Updated 3 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆33Updated 2 weeks ago
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆55Updated last week
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆33Updated 5 months ago
- ☆21Updated last month
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆39Updated 3 months ago