InnovatorLM / Innovator-VLLinks
Fully Open-source Multimodal Language Models for Science Discovery
☆107Updated 2 weeks ago
Alternatives and similar repositories for Innovator-VL
Users that are interested in Innovator-VL are comparing it to the libraries listed below
Sorting:
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆97Updated 2 weeks ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆147Updated 6 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆136Updated 5 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆119Updated 6 months ago
- [arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning☆61Updated last month
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆128Updated last month
- Official PyTorch implementation of TokenSet.☆127Updated 10 months ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- ☆94Updated 3 months ago
- Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…☆390Updated 3 weeks ago
- This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.☆115Updated 3 months ago
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆309Updated last month
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆70Updated 3 months ago
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆125Updated last week
- (ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆186Updated this week
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆278Updated this week
- ☆63Updated 7 months ago
- [🚀 ICLR 2026 Oral]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s M…☆602Updated last month
- ☆37Updated 2 months ago
- [ICCV2025] WikiAutoGen offical page☆24Updated this week
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆37Updated last year
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- ☆42Updated 6 months ago
- ☆17Updated 6 months ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆34Updated 5 months ago
- ☆517Updated 2 weeks ago
- The open-source code of MetaStone-S1.☆105Updated 6 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- An open source implementation of CLIP (With TULIP Support)☆165Updated 8 months ago