NZqian / RapBank
☆58Updated 4 months ago
Alternatives and similar repositories for RapBank:
Users that are interested in RapBank are comparing it to the libraries listed below
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆138Updated 7 months ago
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆126Updated 6 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆91Updated 2 months ago
- Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).☆95Updated last week
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆27Updated 2 months ago
- ☆51Updated 6 months ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆40Updated 5 months ago
- flow mirror models from JZX AI Labs☆43Updated 3 months ago
- 重构GPT-SOVITS的项目,重写了部分代码,优化了webui的使用以及增加了api调用☆20Updated last month
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆57Updated 2 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆98Updated last month
- ☆18Updated 2 weeks ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆43Updated this week
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆65Updated 3 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆74Updated 3 weeks ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆102Updated 6 months ago
- Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀☆81Updated last year
- GPT-style network for phonemization with durations of text☆64Updated 9 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆78Updated 9 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆119Updated last week
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆92Updated 3 weeks ago
- ☆15Updated 2 months ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆79Updated 3 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆72Updated 2 months ago
- Zero-Shot Emotion Style Transfer☆39Updated 9 months ago
- ☆65Updated last year
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆59Updated 4 months ago
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆15Updated 10 months ago
- Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.☆26Updated this week