hustvl / InfiniteVLLinks
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
☆77Updated last week
Alternatives and similar repositories for InfiniteVL
Users that are interested in InfiniteVL are comparing it to the libraries listed below
Sorting:
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆129Updated last month
- ☆58Updated 8 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated last week
- ☆63Updated 6 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Updated 2 weeks ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆78Updated 2 months ago
- ☆63Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆194Updated last month
- ☆93Updated last month
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆109Updated last month
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆157Updated 8 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆33Updated last month
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆128Updated last month
- ☆63Updated last week
- Official repo for UAE☆164Updated last month
- ☆97Updated 7 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆166Updated 10 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆79Updated 2 months ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- ☆68Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆139Updated 5 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated 2 months ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- ☆141Updated 3 months ago