NVIDIA / CosmosLinks

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

☆8,058

Alternatives and similar repositories for Cosmos

Users that are interested in Cosmos are comparing it to the libraries listed below

Sorting:

NVIDIA / Cosmos-Tokenizer
A suite of image and video neural tokenizers
☆1,656Updated 5 months ago
Genesis-Embodied-AI / Genesis
A generative world for general-purpose robotics & embodied AI learning.
☆26,042Updated this week
Physical-Intelligence / openpi
☆4,114Updated this week
openvla / openvla
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆3,368Updated 4 months ago
facebookresearch / vjepa2
PyTorch code and models for VJEPA2 self-supervised learning from video.
☆1,954Updated 3 weeks ago
OpenDriveLab / AgiBot-World
[IROS 2025] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
☆2,233Updated last week
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,450Updated last week
QwenLM / Qwen2.5-VL
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆11,773Updated 2 months ago
Tencent-Hunyuan / HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
☆10,762Updated 3 weeks ago
manycore-research / SpatialLM
SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆3,539Updated last week
MoonshotAI / Kimi-k1.5
☆3,453Updated 4 months ago
microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,763Updated 2 months ago
facebookresearch / sapiens
High-resolution models for human tasks.
☆5,088Updated 8 months ago
NVlabs / Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
☆4,396Updated last week
SandAI-org / MAGI-1
MAGI-1: Autoregressive Video Generation at Scale
☆3,409Updated last month
ByteDance-Seed / Bagel
Open-source unified multimodal model
☆4,687Updated 3 weeks ago
isaac-sim / IsaacLab
Unified framework for robot learning built on NVIDIA Isaac Sim
☆4,475Updated this week
eloialonso / diamond
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…
☆1,841Updated 7 months ago
deepseek-ai / DeepSeek-VL2
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
☆4,981Updated 5 months ago
genmoai / mochi
The best OSS video generation models
☆3,321Updated 6 months ago
stepfun-ai / Step-Video-T2V
☆3,082Updated 4 months ago
facebookresearch / sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆16,353Updated 7 months ago
MiniMax-AI / MiniMax-01
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
☆3,074Updated 3 weeks ago
NVIDIA / Isaac-GR00T
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
☆4,509Updated last week
apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,752Updated last month
StarsfieldAI / R1-V
Witness the aha moment of VLM with less than $3.
☆3,873Updated 2 months ago
deepseek-ai / DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
☆3,937Updated last year
yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,889Updated 4 months ago
huggingface / nanoVLM
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆3,799Updated last week
SonyResearch / micro_diffusion
Official repository for our work on micro-budget training of large-scale diffusion models.
☆1,501Updated 6 months ago