eloialonso / diamondLinks
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
☆1,966Updated last year
Alternatives and similar repositories for diamond
Users that are interested in diamond are comparing it to the libraries listed below
Sorting:
- Inference script for Oasis 500M☆2,042Updated last year
- A suite of image and video neural tokenizers☆1,704Updated last year
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,548Updated last year
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,954Updated 5 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,499Updated 11 months ago
- Mastering Diverse Domains through World Models☆2,758Updated 4 months ago
- 4M: Massively Multimodal Masked Modeling☆1,789Updated 8 months ago
- MineWorld: A Real-time interactive world model on Minecraft☆442Updated 6 months ago
- The best OSS video generation models, created by Genmo☆3,595Updated 2 months ago
- code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"☆1,152Updated 3 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,950Updated 2 weeks ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,251Updated 11 months ago
- A minimal implementation of DeepMind's Genie world model☆1,140Updated 2 months ago
- Continuous Thought Machines, because thought takes time and reasoning is a process.☆1,750Updated last month
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,928Updated last year
- Stable Virtual Camera: Generative View Synthesis with Diffusion Models☆1,554Updated 8 months ago
- [ICCV'25]DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion☆1,327Updated 3 months ago
- The first behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.☆729Updated 8 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,085Updated last year
- The official implementation of CVPR'25 Oral paper "Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped No…☆1,064Updated 3 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,448Updated 7 months ago
- Next-Token Prediction is All You Need☆2,339Updated 3 weeks ago
- Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model☆2,664Updated last month
- ☆321Updated 8 months ago
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos☆1,637Updated 5 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,737Updated 2 months ago
- Unifying 3D Mesh Generation with Language Models☆1,136Updated 10 months ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,147Updated 2 weeks ago
- Official Repository for "DrEureka: Language Model Guided Sim-To-Real Transfer" (RSS 2024)☆918Updated last year
- Code release for https://kovenyu.com/WonderWorld/☆708Updated 9 months ago