Agentic MLLMs
☆170Oct 24, 2025Updated 4 months ago
Alternatives and similar repositories for Awesome-Agentic-MLLMs
Users that are interested in Awesome-Agentic-MLLMs are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- ☆26May 13, 2025Updated 9 months ago
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆12Feb 11, 2025Updated last year
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆16Feb 9, 2026Updated last month
- ☆16Jun 10, 2025Updated 8 months ago
- Fast and memory-efficient exact attention☆19Updated this week
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 2 months ago
- A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…☆46Feb 15, 2026Updated 3 weeks ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 3 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆103Aug 30, 2025Updated 6 months ago
- [ICLR 2026] SR-Scientist: Scientific Equation Discovery With Agentic AI☆33Jan 27, 2026Updated last month
- [ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow☆36Oct 3, 2025Updated 5 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- ☆73May 23, 2025Updated 9 months ago
- ☆16Aug 5, 2024Updated last year
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated last year
- Quick Long Video Understanding [TMLR2025]☆76Oct 27, 2025Updated 4 months ago
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 11 months ago
- ☆22Dec 30, 2024Updated last year
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆28May 14, 2025Updated 9 months ago
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- ☆19Dec 6, 2023Updated 2 years ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Jun 9, 2025Updated 9 months ago
- ☆46Dec 30, 2024Updated last year
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniv…☆27Jun 16, 2025Updated 8 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆86Jan 21, 2026Updated last month
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Oct 9, 2024Updated last year
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆74Nov 4, 2025Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuning☆34Mar 6, 2025Updated last year
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 11 months ago
- ☆78Jan 22, 2026Updated last month
- StructSR: Refuse Spurious Details in Real-World Image Super-Resolution☆28Jan 16, 2025Updated last year
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 4 months ago
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 8 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago