nvidia-cosmos / cosmos-reason2Links
Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
☆64Updated 2 weeks ago
Alternatives and similar repositories for cosmos-reason2
Users that are interested in cosmos-reason2 are comparing it to the libraries listed below
Sorting:
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 7 months ago
- NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks☆200Updated last month
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆134Updated last year
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy☆226Updated 9 months ago
- MiMo-Embodied☆342Updated last month
- Detect corn stalks for micro-sensor insertion☆13Updated last year
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆46Updated 3 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 3 months ago
- ☆42Updated 7 months ago
- Spot Sim2Real Infrastructure☆100Updated 7 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆83Updated this week
- A paper list of world model☆28Updated 9 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆145Updated last week
- ☆60Updated 9 months ago
- Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.☆274Updated this week
- ☆60Updated last month
- ☆78Updated 7 months ago
- Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation☆100Updated 5 months ago
- The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight☆72Updated last month
- Spatial Aptitude Training for Multimodal Langauge Models☆21Updated last week
- ☆63Updated 10 months ago
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆27Updated last year
- ☆26Updated 9 months ago
- Code for "Interactive Task Planning with Language Models"☆33Updated 8 months ago
- Official implementation of the paper "EgoPet: Egomotion and Interaction Data from an Animal's Perspective".☆28Updated 3 weeks ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆79Updated 7 months ago
- ☆57Updated 7 months ago
- ☆130Updated 3 months ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆51Updated 11 months ago
- [CVPR 2025] Source codes for the paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning"☆205Updated 3 months ago