hustvl / InfiniteVLLinks
This is the offical repository of InfiniteVL
☆76Updated last month
Alternatives and similar repositories for InfiniteVL
Users that are interested in InfiniteVL are comparing it to the libraries listed below
Sorting:
- ☆58Updated 8 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆128Updated last month
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- ☆61Updated 3 weeks ago
- ☆63Updated 6 months ago
- ☆62Updated 2 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆77Updated 2 months ago
- ☆93Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago
- Official repo for UAE☆155Updated last month
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 2 months ago
- ☆96Updated 7 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆38Updated last week
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆48Updated this week
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆191Updated last month
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆156Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- Code for the Molmo2 Vision-Language Model☆139Updated last month
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆29Updated 3 weeks ago
- ☆66Updated 2 months ago
- A Large-scale Video Action Dataset☆341Updated 2 weeks ago
- ☆40Updated 2 months ago
- PyTorch implementation of NEPA☆303Updated last week
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 3 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆99Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆159Updated 2 weeks ago
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆127Updated last month