SaraGhazanfari / EMMALinks
EMMA [TMLR 2025]
β12Updated 4 months ago
Alternatives and similar repositories for EMMA
Users that are interested in EMMA are comparing it to the libraries listed below
Sorting:
- π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospeβ¦β51Updated last week
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ57Updated this week
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ87Updated 11 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ84Updated 3 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Modelsβ110Updated last year
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ141Updated 10 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understandingβ47Updated 4 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentationβ104Updated 4 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β54Updated 3 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigationβ132Updated 4 months ago
- Official implement of MIA-DPOβ70Updated last year
- Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"β54Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β205Updated 6 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β83Updated 6 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"β39Updated last year
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ70Updated 3 months ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Modelsβ47Updated last month
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"β110Updated last month
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projectionβ49Updated 10 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoningβ96Updated 4 months ago
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodologyβ72Updated this week
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attβ¦β62Updated 3 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Modelsβ47Updated 6 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ78Updated 2 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β33Updated last year
- β47Updated last week
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)β88Updated 4 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ97Updated 2 months ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attentionβ60Updated last year