xuyang-liu16 / GlobalCom2
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆18Updated last week
Alternatives and similar repositories for GlobalCom2:
Users that are interested in GlobalCom2 are comparing it to the libraries listed below
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆26Updated 3 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆21Updated 2 months ago
- Code release for VTW (AAAI 2025) Oral☆34Updated 3 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆35Updated 2 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆40Updated this week
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆46Updated 5 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆29Updated 4 months ago
- ☆42Updated 3 months ago
- p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay☆35Updated 3 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆31Updated 6 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆27Updated 2 months ago
- Official implementation of MC-LLaVA.☆24Updated 2 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆58Updated 4 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆53Updated last week
- ☆34Updated 9 months ago
- ☆72Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆161Updated 3 months ago
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆47Updated last month
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆20Updated last month
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆19Updated 2 months ago
- DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆40Updated last week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆71Updated 10 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆23Updated last week
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆36Updated 2 months ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆24Updated 3 weeks ago
- 📚 Collection of token reduction for model compression resources.☆51Updated last week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated last month
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆16Updated 9 months ago
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆32Updated last month
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆35Updated 3 weeks ago