nvidia-cosmos / cosmos-reason1Links
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
☆517Updated last week
Alternatives and similar repositories for cosmos-reason1
Users that are interested in cosmos-reason1 are comparing it to the libraries listed below
Sorting:
- Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environment…☆514Updated last week
- Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world m…☆273Updated last week
- [ICLR 2025] LAPA: Latent Action Pretraining from Videos☆312Updated 5 months ago
- [RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions☆474Updated last week
- Embodied Chain of Thought: A robotic policy that reason to solve the task.☆267Updated 2 months ago
- Official repo and evaluation implementation of VSI-Bench☆522Updated 3 months ago
- ☆363Updated 5 months ago
- world modeling challenge for humanoid robots☆490Updated 7 months ago
- Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence☆441Updated 2 weeks ago
- Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence☆968Updated 4 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆1,522Updated this week
- A flexible and efficient codebase for training visually-conditioned language models (VLMs)☆712Updated 11 months ago
- Embodied Reasoning Question Answer (ERQA) Benchmark☆167Updated 3 months ago
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success☆481Updated last month
- Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations https://video-prediction-policy.github.io☆207Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆577Updated 3 weeks ago
- Compose multimodal datasets 🎹☆413Updated 2 weeks ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆213Updated 3 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆353Updated 2 months ago
- Nvidia GEAR Lab's initiative to solve the robotics data problem using world models☆163Updated last week
- 🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.☆360Updated this week
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆291Updated 9 months ago
- A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instruc…☆304Updated 2 months ago
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Goo…☆661Updated 2 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆270Updated 3 weeks ago
- Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"☆261Updated last year
- [ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model☆531Updated 7 months ago
- Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).☆163Updated 10 months ago
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation☆290Updated 3 weeks ago
- ☆206Updated 3 months ago