NVIDIA / Cosmos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate vide…
☆7,847Updated this week
Alternatives and similar repositories for Cosmos:
Users that are interested in Cosmos are comparing it to the libraries listed below
- A generative world for general-purpose robotics & embodied AI learning.☆24,579Updated this week
- ☆2,753Updated last week
- A suite of image and video neural tokenizers☆1,590Updated last month
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,655Updated 2 weeks ago
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆9,282Updated this week
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,491Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,071Updated last week
- SpatialLM: Large Language Model for Spatial Understanding☆2,563Updated this week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,415Updated 2 weeks ago
- Official repository for LTX-Video☆3,221Updated 3 weeks ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆2,416Updated last week
- Everything you need to build state-of-the-art foundation models, end-to-end.☆7,775Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆5,994Updated this week
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆14,761Updated 3 months ago
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,919Updated 2 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,364Updated 2 months ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆9,327Updated this week
- Official inference framework for 1-bit LLMs☆12,851Updated last month
- Janus-Series: Unified Multimodal Understanding and Generation Models☆16,985Updated 2 months ago
- ☆8,748Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,796Updated this week
- Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.☆16,443Updated 2 weeks ago
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero☆11,419Updated 3 weeks ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,513Updated this week
- High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.☆8,069Updated this week
- ☆3,630Updated last month
- High-resolution models for human tasks.☆4,912Updated 4 months ago
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆4,628Updated last month
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,248Updated 4 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆12,672Updated this week