feizc / FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
☆1,667Updated 2 months ago
Alternatives and similar repositories for FluxMusic:
Users that are interested in FluxMusic are comparing it to the libraries listed below
- [arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,096Updated this week
- Official repository for LTX-Video☆2,857Updated this week
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆801Updated this week
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,307Updated 3 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,251Updated last week
- Generative models for conditional audio generation☆2,896Updated last month
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆647Updated 3 weeks ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,386Updated last week
- A fast multimodal LLM for real-time voice☆3,589Updated last week
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,780Updated last month
- Interface for OuteTTS models.☆923Updated this week
- Taming Stable Diffusion for Lip Sync!☆2,538Updated last month
- Local realtime voice AI☆2,224Updated this week
- The best OSS video generation models☆2,915Updated last month
- first base model for full-duplex conversational audio☆1,707Updated last month
- A general fine-tuning kit geared toward diffusion models.☆2,092Updated last week
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,439Updated last month
- STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution☆939Updated 3 weeks ago
- FastVideo is a lightweight framework for accelerating large video diffusion models.☆1,095Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,506Updated last week
- Select a portrait, click to move the head around (please use your own space / GPU!)☆820Updated 2 months ago
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆519Updated 6 months ago
- TTS with kokoro and onnx runtime☆1,596Updated this week
- OpenMusic: SOTA Text-to-music (TTM) Generation☆529Updated last month