NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,062Updated this week
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,718Updated last month
- A suite of image and video neural tokenizers☆1,697Updated 11 months ago
- A generative world for general-purpose robotics & embodied AI learning.☆27,922Updated this week
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆7,370Updated 11 months ago
- High-resolution models for human tasks.☆5,257Updated last year
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,686Updated 4 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,152Updated 8 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,889Updated 3 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,752Updated 11 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,215Updated last year
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,168Updated 3 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,884Updated 2 weeks ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,427Updated 10 months ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆17,662Updated this week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,494Updated 2 months ago
- NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.☆5,756Updated 3 weeks ago
- Fully open reproduction of DeepSeek-R1☆25,785Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,780Updated 7 months ago
- Open-source unified multimodal model☆5,539Updated 2 months ago
- The best OSS video generation models, created by Genmo☆3,562Updated last month
- [CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer☆12,137Updated 3 months ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆7,030Updated 9 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,668Updated 11 months ago
- Reference PyTorch implementation and models for DINOv3☆9,158Updated last month
- ☆9,654Updated 2 weeks ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,038Updated 3 weeks ago
- [IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,719Updated 3 weeks ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,944Updated last year
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,943Updated last year
- A unified inference and post-training framework for accelerated video generation.☆2,898Updated this week