eloialonso / diamond
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
☆1,509Updated this week
Related projects ⓘ
Alternatives and complementary repositories for diamond
- The best OSS video generation models☆1,804Updated this week
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,209Updated last week
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation☆913Updated last week
- Text-to-Music Generation with Rectified Flow Transformers☆1,592Updated 2 months ago
- OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340☆2,018Updated this week
- A general fine-tuning kit geared toward diffusion models.☆1,773Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,823Updated 3 months ago
- ☆601Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,600Updated last month
- Mora: More like Sora for Generalist Video Generation☆1,513Updated last month
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,664Updated 3 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,070Updated 3 months ago
- A MLX port of FLUX based on the Huggingface Diffusers implementation.☆928Updated last week
- first base model for full-duplex conversational audio☆1,248Updated this week
- PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis☆2,790Updated last week
- Distributed Training Over-The-Internet☆680Updated 2 months ago
- ☆1,595Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,532Updated last month
- VideoSys: An easy and efficient system for video generation☆1,761Updated this week
- High-resolution models for human tasks.☆4,455Updated 2 weeks ago
- Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple te…☆548Updated last week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆553Updated this week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆753Updated last week
- nanoGPT style version of Llama 3.1☆1,231Updated 3 months ago
- ☆2,824Updated 3 weeks ago
- [ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models☆421Updated 2 months ago
- [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment☆2,563Updated last week
- Codebase for Aria - an Open Multimodal Native MoE☆779Updated this week