NVIDIA / Cosmos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate vide…
☆7,810Updated last week
Alternatives and similar repositories for Cosmos:
Users that are interested in Cosmos are comparing it to the libraries listed below
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆9,282Updated this week
- A suite of image and video neural tokenizers☆1,582Updated last month
- A generative world for general-purpose robotics & embodied AI learning.☆24,579Updated this week
- SpatialLM: Large Language Model for Spatial Understanding☆2,563Updated this week
- ☆2,753Updated last week
- Fully open reproduction of DeepSeek-R1☆23,467Updated this week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆16,878Updated last month
- Witness the aha moment of VLM with less than $3.☆3,430Updated 3 weeks ago
- ☆3,249Updated 3 weeks ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,071Updated this week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,513Updated this week
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,362Updated last week
- Wan: Open and Advanced Large-Scale Video Generative Models☆9,327Updated this week
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆4,628Updated last month
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,796Updated this week
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆14,761Updated 3 months ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆9,454Updated 2 weeks ago
- Sky-T1: Train your own O1 preview model within $450☆3,149Updated this week
- Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your resea…☆4,026Updated this week
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬☆10,526Updated last week
- A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes…☆2,232Updated 2 months ago
- An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl☆5,208Updated last month
- s1: Simple test-time scaling☆6,086Updated 3 weeks ago
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero☆11,419Updated 2 weeks ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,867Updated last month
- This package contains the original 2012 AlexNet code.☆2,258Updated 2 weeks ago
- Vision agent☆4,420Updated this week
- Everything you need to build state-of-the-art foundation models, end-to-end.☆7,775Updated this week
- A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.☆4,919Updated last month
- Solve Visual Understanding with Reinforced VLMs☆4,400Updated last week