HJYao00 / Awesome-Agentic-MLLMsView external linksLinks
Agentic MLLMs
☆166Oct 24, 2025Updated 3 months ago
Alternatives and similar repositories for Awesome-Agentic-MLLMs
Users that are interested in Awesome-Agentic-MLLMs are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 7 months ago
- ☆24May 13, 2025Updated 9 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 5 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated last week
- Fast and memory-efficient exact attention☆18Jan 23, 2026Updated 3 weeks ago
- ☆16Jun 10, 2025Updated 8 months ago
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 4 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated last month
- A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…☆46Feb 9, 2026Updated last week
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆76Nov 20, 2025Updated 2 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Aug 30, 2025Updated 5 months ago
- [ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow☆35Oct 3, 2025Updated 4 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- ☆73May 23, 2025Updated 8 months ago
- ☆16Aug 5, 2024Updated last year
- Quick Long Video Understanding [TMLR2025]☆75Oct 27, 2025Updated 3 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated 11 months ago
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 10 months ago
- Dr. MAS is an end-to-end RL training framework for multi-agent LLM systems, supporting the co-training of multiple (heterogeneous) LLMs.☆60Updated this week
- ☆22Dec 30, 2024Updated last year
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆27May 14, 2025Updated 9 months ago
- ☆19Dec 6, 2023Updated 2 years ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Jun 9, 2025Updated 8 months ago
- ☆46Dec 30, 2024Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆84Jan 21, 2026Updated 3 weeks ago
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniv…☆27Jun 16, 2025Updated 8 months ago
- ☆77Jan 22, 2026Updated 3 weeks ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Oct 9, 2024Updated last year
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 10 months ago
- Official Repository of Personalized Visual Instruct Tuning☆34Mar 6, 2025Updated 11 months ago
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆73Nov 4, 2025Updated 3 months ago
- StructSR: Refuse Spurious Details in Real-World Image Super-Resolution☆28Jan 16, 2025Updated last year
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 7 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- ☆68Feb 5, 2026Updated last week
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago