NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆7,989Updated last month
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- A generative world for general-purpose robotics & embodied AI learning.☆25,165Updated this week
- ☆3,395Updated last month
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,197Updated last week
- Minimal reproduction of DeepSeek R1-Zero☆11,811Updated last month
- A suite of image and video neural tokenizers☆1,627Updated 3 months ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆10,179Updated last week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,306Updated 4 months ago
- SpatialLM: Large Language Model for Spatial Understanding☆3,187Updated 2 months ago
- High-resolution models for human tasks.☆5,024Updated 6 months ago
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,709Updated 2 weeks ago
- NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.☆3,927Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,003Updated this week
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆9,613Updated this week
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,815Updated 5 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,280Updated last week
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,416Updated 4 months ago
- s1: Simple test-time scaling☆6,394Updated last week
- Utilities intended for use with Llama models.☆7,036Updated 3 weeks ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,665Updated last week
- The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,059Updated this week
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,867Updated 2 months ago
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆21,673Updated last week
- MAGI-1: Autoregressive Video Generation at Scale☆3,191Updated this week
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,821Updated 2 months ago
- Witness the aha moment of VLM with less than $3.☆3,688Updated last week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,994Updated 2 weeks ago
- ☆3,342Updated 2 months ago
- A simple screen parsing tool towards pure vision based GUI agent☆22,258Updated 2 months ago
- Official inference repo for FLUX.1 models☆21,848Updated 3 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,715Updated 2 weeks ago