NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,056Updated 5 months ago
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,679Updated last week
- A suite of image and video neural tokenizers☆1,691Updated 9 months ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆11,357Updated 2 weeks ago
- ☆9,107Updated 2 weeks ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆16,785Updated last week
- A generative world for general-purpose robotics & embodied AI learning.☆27,712Updated this week
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,102Updated 2 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,624Updated 10 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,761Updated last week
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆4,573Updated 8 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,867Updated 2 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,503Updated 3 months ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆14,798Updated this week
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆11,109Updated last month
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,033Updated 7 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,911Updated 11 months ago
- MAGI-1: Autoregressive Video Generation at Scale☆3,563Updated 5 months ago
- Open-source unified multimodal model☆5,409Updated last month
- Official repository for LTX-Video☆8,850Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,974Updated 3 weeks ago
- Witness the aha moment of VLM with less than $3.☆3,997Updated 6 months ago
- s1: Simple test-time scaling☆6,605Updated 5 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,331Updated last month
- High-resolution models for human tasks.☆5,233Updated last year
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,068Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆17,881Updated 11 months ago
- The best OSS video generation models, created by Genmo☆3,521Updated 3 weeks ago
- Minimal reproduction of DeepSeek R1-Zero☆12,444Updated 7 months ago
- [IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,625Updated last month
- Sky-T1: Train your own O1 preview model within $450☆3,358Updated 4 months ago