NVIDIA / Cosmos
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆7,945Updated last week
Alternatives and similar repositories for Cosmos:
Users that are interested in Cosmos are comparing it to the libraries listed below
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,262Updated this week
- A suite of image and video neural tokenizers☆1,621Updated 2 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,201Updated last week
- ☆3,189Updated last week
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,747Updated last month
- NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.☆3,635Updated 3 weeks ago
- SpatialLM: Large Language Model for Spatial Understanding☆3,132Updated last month
- A generative world for general-purpose robotics & embodied AI learning.☆24,924Updated this week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,204Updated 3 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,623Updated this week
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,804Updated 5 months ago
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆3,816Updated last year
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆4,793Updated 2 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,098Updated this week
- MAGI-1: Autoregressive Video Generation at Scale☆2,857Updated last week
- [ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior☆2,920Updated 2 weeks ago
- ☆3,325Updated 2 months ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,710Updated last month
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,887Updated 7 months ago
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence☆5,711Updated 7 months ago
- Composable building blocks to build Llama Apps☆7,766Updated this week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,867Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,719Updated 2 months ago
- Vision agent☆4,558Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆7,626Updated this week
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆908Updated last week
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆9,293Updated last week
- Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥☆38,242Updated this week
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,979Updated 2 months ago
- Democratizing Reinforcement Learning for LLMs☆3,182Updated 3 weeks ago