lntzm / HIComLinks
[CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
☆13Updated 2 months ago
Alternatives and similar repositories for HICom
Users that are interested in HICom are comparing it to the libraries listed below
Sorting:
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs☆38Updated last month
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆80Updated 2 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆83Updated last month
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆34Updated 5 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆68Updated 2 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Updated 2 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 4 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆34Updated 3 months ago
- [CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning☆20Updated 3 weeks ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- ☆17Updated 2 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆141Updated last week
- Official implementation of MC-LLaVA.☆32Updated last month
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆23Updated 3 months ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆37Updated last year
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆30Updated 6 months ago
- VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning☆32Updated 3 weeks ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆29Updated 3 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆24Updated last month
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆56Updated 8 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆29Updated 2 weeks ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated last month
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆33Updated last month
- [CVPR 2025] Official PyTorch Code for "MMRL: Multi-Modal Representation Learning for Vision-Language Models" and its extension "MMRL++: P…☆57Updated 3 weeks ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆39Updated 3 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆19Updated 5 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆41Updated 3 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆94Updated last month
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆97Updated last month