The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
☆82Jan 16, 2026Updated last month
Alternatives and similar repositories for Mantis
Users that are interested in Mantis are comparing it to the libraries listed below
Sorting:
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆34Jan 16, 2026Updated last month
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models☆162Jan 4, 2026Updated last month
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 2 months ago
- GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting (NOSSDAV 2025)☆27Aug 15, 2025Updated 6 months ago
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 8 months ago
- ☆34Jan 25, 2026Updated last month
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆93Dec 2, 2025Updated 3 months ago
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- F1: A Vision Language Action Model Bridging Understanding and Generation to Actions☆161Jan 2, 2026Updated 2 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆16Nov 24, 2024Updated last year
- A Curated List of Vision-Language-Action (VLA) Research☆61Updated this week
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆31Updated this week
- [AAAI 2026] Official implementation of paper "UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajector…☆43Jan 30, 2026Updated last month
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆38Feb 19, 2026Updated last week
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆19Oct 14, 2024Updated last year
- ☆22Feb 15, 2026Updated 2 weeks ago
- Dr. MAS is an end-to-end RL training framework for multi-agent LLM systems, supporting the co-training of multiple (heterogeneous) LLMs.☆89Feb 11, 2026Updated 2 weeks ago
- ☆43Jan 30, 2026Updated last month
- AI model training on heterogeneous, geo-distributed resources☆37Nov 24, 2025Updated 3 months ago
- Hands-On Image Processing with Python, Second Edition, Published by Packt☆26Feb 11, 2026Updated 2 weeks ago
- Official code of "HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud …☆25Oct 31, 2025Updated 4 months ago
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficien…☆27Oct 10, 2025Updated 4 months ago
- ☆33Jul 15, 2025Updated 7 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆105Jan 14, 2026Updated last month
- This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents wit…☆55Feb 7, 2026Updated 3 weeks ago
- Vocabulary Parallelism☆25Mar 10, 2025Updated 11 months ago
- Test-time Scaling for VAR models☆31Sep 19, 2025Updated 5 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- [ICCV25] TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers☆41Jul 23, 2025Updated 7 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆59Mar 17, 2025Updated 11 months ago
- Implementation for Robust LiDAR-Camera Calibration with 2D Gaussian Splatting☆38Oct 19, 2025Updated 4 months ago
- ☆60Jan 12, 2026Updated last month
- An adaptive sampling framework for Reinforce-style LLM post training.☆90Nov 29, 2025Updated 3 months ago
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆106Jan 27, 2026Updated last month
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆125Feb 14, 2025Updated last year
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Feb 6, 2026Updated 3 weeks ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago