NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,088Updated 3 weeks ago
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- A suite of image and video neural tokenizers☆1,702Updated 11 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,853Updated 5 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,731Updated 2 months ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,066Updated this week
- Open-source unified multimodal model☆5,601Updated 3 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,204Updated 4 months ago
- Reference PyTorch implementation and models for DINOv3☆9,393Updated 2 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,589Updated 3 months ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,113Updated last week
- Witness the aha moment of VLM with less than $3.☆4,025Updated 8 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,687Updated 11 months ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆5,131Updated 10 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,204Updated 9 months ago
- Fully open reproduction of DeepSeek-R1☆25,842Updated 2 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,890Updated last week
- Sky-T1: Train your own O1 preview model within $450☆3,369Updated 6 months ago
- ☆4,523Updated 4 months ago
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long c…☆886Updated 3 weeks ago
- MAGI-1: Autoregressive Video Generation at Scale☆3,635Updated 7 months ago
- [ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆2,120Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,789Updated 7 months ago
- NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.☆6,019Updated last week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,941Updated this week
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆7,180Updated 8 months ago
- The best OSS video generation models, created by Genmo☆3,581Updated 2 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,370Updated last year
- High-resolution models for human tasks.☆5,272Updated last year
- ☆3,152Updated 10 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,952Updated last year
- ☆3,463Updated 10 months ago