NVIDIA / CosmosLinks
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
☆8,056Updated 3 months ago
Alternatives and similar repositories for Cosmos
Users that are interested in Cosmos are comparing it to the libraries listed below
Sorting:
- NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.☆4,844Updated last week
- A suite of image and video neural tokenizers☆1,670Updated 6 months ago
- A generative world for general-purpose robotics & embodied AI learning.☆27,205Updated this week
- ☆4,526Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,528Updated last month
- High-resolution models for human tasks.☆5,134Updated 9 months ago
- OpenVLA: An open-source vision-language-action model for robotic manipulation.☆3,750Updated 5 months ago
- Unified framework for robot learning built on NVIDIA Isaac Sim☆4,769Updated this week
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,153Updated last week
- Reference PyTorch implementation and models for DINOv3☆6,453Updated this week
- [IROS 2025] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems☆2,338Updated last week
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,928Updated last week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,799Updated 3 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,993Updated this week
- Open-source unified multimodal model☆4,948Updated 2 weeks ago
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬☆11,461Updated 4 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,472Updated this week
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆12,383Updated 3 months ago
- s1: Simple test-time scaling☆6,541Updated 2 months ago
- MAGI-1: Autoregressive Video Generation at Scale☆3,475Updated 2 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,326Updated last month
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,800Updated 4 months ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆1,594Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,764Updated 3 months ago
- Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your resea…☆4,883Updated 2 weeks ago
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆6,304Updated 4 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,198Updated 6 months ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆13,922Updated last month
- ☆3,461Updated 6 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,050Updated last year