DoubtedSteam / RoELinks
The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"
☆15Updated 3 months ago
Alternatives and similar repositories for RoE
Users that are interested in RoE are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆37Updated last year
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆34Updated 5 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆38Updated 2 months ago
- ☆16Updated 3 months ago
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…☆26Updated 4 months ago
- ☆89Updated 3 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆43Updated 6 months ago
- [ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models☆19Updated 3 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆34Updated 3 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆96Updated 9 months ago
- Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs☆17Updated 2 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆64Updated 3 weeks ago
- Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs☆24Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆60Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆184Updated this week
- CLIP-MoE: Mixture of Experts for CLIP☆42Updated 9 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆71Updated this week
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆12Updated 7 months ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆74Updated 3 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆80Updated 2 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆89Updated 7 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆57Updated 6 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆230Updated 2 months ago
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models☆50Updated last year
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆17Updated last year
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆24Updated last month
- Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"☆39Updated last month
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆43Updated 5 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆35Updated last year
- [ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation☆32Updated 4 months ago