NZqian / RapBankLinks
☆65Updated 8 months ago
Alternatives and similar repositories for RapBank
Users that are interested in RapBank are comparing it to the libraries listed below
Sorting:
- A curated list of Video to Audio Generation☆45Updated last month
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆99Updated 3 weeks ago
- ☆67Updated 2 months ago
- Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).☆109Updated 4 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆165Updated last year
- ☆24Updated 5 months ago
- ☆47Updated 4 months ago
- official code for CVPR'24 paper Diff-BGM☆63Updated 7 months ago
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]☆159Updated 3 weeks ago
- ☆59Updated 10 months ago
- 重构GPT-SOVITS的项目,重写了部分代码,优化了webui的使用以及增加了api调用☆27Updated 5 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆87Updated 3 weeks ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆39Updated 3 weeks ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆42Updated 2 months ago
- Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀☆81Updated last year
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆231Updated 2 months ago
- Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models☆38Updated 3 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆99Updated 5 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆75Updated this week
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆27Updated 3 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆74Updated last month
- ☆94Updated 6 months ago
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆128Updated 10 months ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆58Updated last month
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆79Updated 7 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆41Updated last month
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆46Updated 8 months ago
- XMIDI Dataset: A large-scale symbolic music dataset with emotion and genre labels.☆22Updated 4 months ago
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆196Updated 3 months ago