nvidia-cosmos / cosmos-rlLinks
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
☆63Updated this week
Alternatives and similar repositories for cosmos-rl
Users that are interested in cosmos-rl are comparing it to the libraries listed below
Sorting:
- ☆22Updated 4 months ago
- [ECCV 2024] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving☆30Updated 7 months ago
- [CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation☆81Updated this week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆135Updated last month
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆23Updated last month
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated last month
- Main repo for SimWorld simulator.☆53Updated last month
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆75Updated last month
- Nvidia GEAR Lab's initiative to solve the robotics data problem using world models☆217Updated 3 weeks ago
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆26Updated this week
- ☆17Updated last year
- ☆21Updated this week
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…☆28Updated this week
- Memory Efficient Training Framework for Large Video Generation Model☆25Updated last year
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆25Updated 3 weeks ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆109Updated 8 months ago
- Implementation of the Large Behavioral Model architecture for dexterous manipulation from Toyota Research Institute☆34Updated this week
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆50Updated 2 months ago
- ☆40Updated this week
- Code release for "Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning" (NeurIPS 2023), https://ar…☆66Updated 9 months ago
- [ICCV 2025] VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers☆51Updated 2 weeks ago
- Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆30Updated this week
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆109Updated last month
- Official implementation of T-PAMI25 paper "M²Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes"☆63Updated last month
- Code for Draft Attention☆87Updated last month
- MTGS: Multi-Traversal Gaussian Splatting☆80Updated 2 weeks ago
- To pioneer training long-context multi-modal transformer models☆41Updated this week
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆75Updated last year
- Make your wildest 3D ConvNet dream architectures come true☆79Updated this week