MCG-NJU / p-MoDView external linksLinks
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆43Jun 26, 2025Updated 7 months ago
Alternatives and similar repositories for p-MoD
Users that are interested in p-MoD are comparing it to the libraries listed below
Sorting:
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆42Oct 28, 2025Updated 3 months ago
- A Fine-grained Benchmark for Video Captioning and Retrieval☆26Jul 16, 2025Updated 7 months ago
- [ICML 2025] Differentiable Solver Search for Fast Diffusion Sampling☆21Jul 7, 2025Updated 7 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆70Oct 17, 2025Updated 3 months ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated last month
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 7 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆107Apr 25, 2025Updated 9 months ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated 2 weeks ago
- ☆12Dec 4, 2024Updated last year
- [COLING 2025🔥] Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection☆16Jan 21, 2025Updated last year
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- Applies ROME and MEMIT on Mamba-S4 models☆14Apr 5, 2024Updated last year
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆103Feb 2, 2026Updated 2 weeks ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆21Jun 15, 2025Updated 8 months ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 8 months ago
- ☆24May 23, 2025Updated 8 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆256Oct 18, 2025Updated 3 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online☆89Oct 7, 2025Updated 4 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- Agent-based implementation of RAG, incorporating AI agents into the RAG pipeline to orchestrate its components and perform additional act…☆19Feb 20, 2025Updated 11 months ago
- Motion-Aware Generative Frame Interpolation☆48Mar 11, 2025Updated 11 months ago
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆22Jun 6, 2025Updated 8 months ago
- ☆19Mar 25, 2025Updated 10 months ago
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 9 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Nov 12, 2024Updated last year
- ☆16Jul 6, 2023Updated 2 years ago
- DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging☆47Apr 27, 2025Updated 9 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" and "Sp…☆241Dec 22, 2025Updated last month
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆21Oct 10, 2024Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated 11 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆74Oct 14, 2024Updated last year
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆44Jan 25, 2026Updated 3 weeks ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21May 28, 2024Updated last year
- ☆19Jan 10, 2025Updated last year
- A Large Multimodal Model for Remote Sensing Change Description (IGARSS 2025)☆22Dec 17, 2025Updated last month
- ☆20Nov 4, 2025Updated 3 months ago
- [CVPR 2024] Asymmetric Masked Distillation for Pre-Training Small Foundation Models☆18Jan 11, 2026Updated last month