mengchuang123 / VASparse-githubLinks
[CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
β34Updated 3 months ago
Alternatives and similar repositories for VASparse-github
Users that are interested in VASparse-github are comparing it to the libraries listed below
Sorting:
- β86Updated 3 months ago
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ23Updated 2 weeks ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ30Updated 4 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Modelsβ17Updated 11 months ago
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attβ¦β23Updated 4 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ73Updated 2 months ago
- Official implement of MIA-DPOβ58Updated 5 months ago
- β49Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ85Updated 3 weeks ago
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ36Updated 4 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"β34Updated last year
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β46Updated last month
- The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"β14Updated 3 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ52Updated 2 weeks ago
- Official PyTorch Code of ReKV (ICLR'25)β28Updated 3 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understandingβ23Updated 3 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ62Updated 2 weeks ago
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Groundingβ20Updated 4 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ43Updated 5 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuningβ33Updated 2 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ28Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ111Updated 3 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Modelsβ93Updated 8 months ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attentionβ36Updated 11 months ago
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ49Updated last month
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ59Updated last week
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Modelsβ12Updated last month
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β33Updated last month
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Modelsβ49Updated 11 months ago
- γNeurIPS 2024γThe official code of paper "Automated Multi-level Preference for MLLMs"β19Updated 9 months ago