NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,091Updated last month
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,737Updated 2 months ago
- A suite of image and video neural tokenizers☆1,702Updated 11 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,903Updated 5 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,806Updated last year
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,499Updated 11 months ago
- ☆10,083Updated last month
- A generative world for general-purpose robotics & embodied AI learning.☆28,101Updated this week
- [IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,766Updated last month
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆5,192Updated 10 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,625Updated 3 months ago
- High-resolution models for human tasks.☆5,277Updated last year
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,083Updated last year
- NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.☆6,082Updated 2 weeks ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,223Updated 4 months ago
- Open-Sora: Democratizing Efficient Video Production for All☆28,492Updated 9 months ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆7,042Updated 10 months ago
- ☆3,908Updated last year
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,894Updated 2 weeks ago
- Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation☆3,477Updated 2 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,698Updated last year
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,960Updated last year
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆5,214Updated 11 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,370Updated 6 months ago
- Next-Token Prediction is All You Need☆2,314Updated 3 weeks ago
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆7,194Updated 9 months ago
- MAGI-1: Autoregressive Video Generation at Scale☆3,639Updated 7 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆7,544Updated last year
- NVIDIA Isaac Sim™ is an open-source application on NVIDIA Omniverse for developing, simulating, and testing AI-driven robots in realistic…☆2,495Updated last month
- Unified framework for robot learning built on NVIDIA Isaac Sim☆6,224Updated last week
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,178Updated last week