feizc / FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
☆1,623Updated last week
Alternatives and similar repositories for FluxMusic:
Users that are interested in FluxMusic are comparing it to the libraries listed below
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,493Updated this week
- Official repository for LTX-Video☆1,645Updated last week
- The best OSS video generation models☆2,253Updated this week
- Various AI scripts. Mostly Stable Diffusion stuff.☆3,512Updated this week
- first base model for full-duplex conversational audio☆1,621Updated 2 weeks ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆1,210Updated this week
- A general fine-tuning kit geared toward diffusion models.☆1,847Updated last week
- [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment☆2,726Updated this week
- OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340☆2,975Updated 2 weeks ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,094Updated 3 months ago
- ☆1,681Updated 3 weeks ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,571Updated last month
- Generative models for conditional audio generation☆2,744Updated 3 weeks ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,163Updated 2 weeks ago
- OpenMusic: SOTA Text-to-music (TTM) Generation☆488Updated last week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆829Updated last month
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,630Updated 2 weeks ago
- Inference and training library for high-quality TTS models.☆4,696Updated this week
- Dead simple FLUX LoRA training UI with LOW VRAM support☆1,394Updated 2 weeks ago
- 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion☆1,550Updated this week
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,265Updated 2 weeks ago
- Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple te…☆661Updated last week
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,353Updated last week
- Local realtime voice AI☆2,014Updated last week
- Interface for OuteTTS models.☆675Updated this week
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,596Updated last week
- ☆6,870Updated last week
- Chat first code editor. To download the packaged app:☆5,176Updated 2 weeks ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,273Updated this week