NVIDIA / Cosmos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate vide…
☆7,847Updated this week
Alternatives and similar repositories for Cosmos:
Users that are interested in Cosmos are comparing it to the libraries listed below
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero☆11,419Updated 3 weeks ago
- A suite of image and video neural tokenizers☆1,590Updated last month
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆9,440Updated last week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆16,985Updated 2 months ago
- s1: Simple test-time scaling☆6,086Updated 3 weeks ago
- ☆2,753Updated 2 weeks ago
- SpatialLM: Large Language Model for Spatial Understanding☆2,563Updated this week
- NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.☆2,927Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,086Updated last week
- Sky-T1: Train your own O1 preview model within $450☆3,167Updated last week
- Fully open reproduction of DeepSeek-R1☆23,467Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆5,994Updated this week
- Witness the aha moment of VLM with less than $3.☆3,430Updated last month
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,513Updated this week
- 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning☆11,483Updated this week
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆4,662Updated last month
- Composable building blocks to build Llama Apps☆7,597Updated this week
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆3,739Updated 11 months ago
- The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆1,864Updated this week
- Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your resea…☆4,110Updated last week
- Simple RL training for reasoning☆3,326Updated this week
- Official inference framework for 1-bit LLMs☆12,851Updated last month
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,416Updated last week
- Everything you need to build state-of-the-art foundation models, end-to-end.☆7,775Updated this week
- ☆3,249Updated 3 weeks ago
- Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥☆36,320Updated this week
- This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.☆5,334Updated 2 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆3,377Updated this week
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆9,528Updated 3 weeks ago
- Vision agent☆4,443Updated this week