NZqian / RapBankLinks
☆65Updated 8 months ago
Alternatives and similar repositories for RapBank
Users that are interested in RapBank are comparing it to the libraries listed below
Sorting:
- A curated list of Video to Audio Generation☆45Updated last month
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆165Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆99Updated 3 weeks ago
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]☆159Updated last month
- ☆59Updated 11 months ago
- ☆67Updated 2 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆163Updated last week
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆99Updated 5 months ago
- ☆24Updated 5 months ago
- ☆47Updated 4 months ago
- Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).☆109Updated 4 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆74Updated last month
- ☆78Updated 7 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆90Updated 5 months ago
- official code for CVPR'24 paper Diff-BGM☆63Updated 7 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆42Updated 2 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆231Updated 2 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆196Updated 3 months ago
- ☆94Updated 6 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆70Updated last week
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆198Updated 5 months ago
- Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models☆38Updated 3 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆39Updated last month
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆72Updated 7 months ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆46Updated 9 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆118Updated 2 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆87Updated 3 weeks ago
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆208Updated last year
- GPT-style network for phonemization with durations of text☆66Updated last year