xuyang-liu16 / GlobalCom2
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆16Updated this week
Alternatives and similar repositories for GlobalCom2:
Users that are interested in GlobalCom2 are comparing it to the libraries listed below
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆22Updated 2 months ago
- ☆40Updated 2 months ago
- Code release for VTW (AAAI 2025) Oral☆33Updated 2 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆31Updated this week
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆20Updated last month
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆34Updated last month
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆26Updated 3 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆21Updated last month
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆33Updated last month
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆44Updated 4 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆16Updated 8 months ago
- ☆50Updated last week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆80Updated 3 weeks ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆31Updated 3 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆70Updated 9 months ago
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆46Updated last month
- ☆68Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆39Updated this week
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆41Updated 5 months ago
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…☆63Updated last week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆46Updated 2 weeks ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆59Updated 3 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆26Updated last month
- CLIP-MoE: Mixture of Experts for CLIP☆29Updated 5 months ago
- ☆22Updated 4 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 6 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆26Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆40Updated 3 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆18Updated last month