NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,067Updated 5 months ago
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,644Updated 3 weeks ago
- A suite of image and video neural tokenizers☆1,679Updated 9 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,080Updated last month
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆4,415Updated 7 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,399Updated 2 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,844Updated last month
- ☆8,700Updated 3 weeks ago
- Open-source unified multimodal model☆5,282Updated 2 weeks ago
- A generative world for general-purpose robotics & embodied AI learning.☆27,568Updated last week
- NVIDIA Isaac GR00T N1.5 - A Foundation Model for Generalist Robots.☆5,322Updated last week
- MAGI-1: Autoregressive Video Generation at Scale☆3,534Updated 4 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,608Updated 9 months ago
- Witness the aha moment of VLM with less than $3.☆3,980Updated 5 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,229Updated 2 weeks ago
- Minimal reproduction of DeepSeek R1-Zero☆12,383Updated 6 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,689Updated last week
- 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning☆19,188Updated last week
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆16,124Updated 2 weeks ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,975Updated 6 months ago
- Reference PyTorch implementation and models for DINOv3☆8,214Updated last week
- ☆3,468Updated 8 months ago
- [IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,596Updated 2 weeks ago
- Fully open reproduction of DeepSeek-R1☆25,629Updated 2 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,236Updated 4 months ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆11,254Updated 2 months ago
- ☆3,129Updated 7 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,770Updated 5 months ago
- High-resolution models for human tasks.☆5,207Updated 11 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,891Updated 11 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,270Updated 8 months ago