NZqian / RapBankLinks
☆65Updated 8 months ago
Alternatives and similar repositories for RapBank
Users that are interested in RapBank are comparing it to the libraries listed below
Sorting:
- A curated list of Video to Audio Generation☆45Updated last month
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]☆159Updated 3 weeks ago
- ☆59Updated 10 months ago
- ☆67Updated 2 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆164Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆98Updated 3 weeks ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆231Updated 2 months ago
- official code for CVPR'24 paper Diff-BGM☆63Updated 7 months ago
- ☆24Updated 5 months ago
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆99Updated 5 months ago
- 重构GPT-SOVITS的项目,重写了部分代码,优化了webui的使用以及增加了api调用☆27Updated 5 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆40Updated 2 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆196Updated 3 months ago
- Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).☆109Updated 4 months ago
- ☆47Updated 4 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆74Updated last month
- Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models☆38Updated 3 months ago
- ☆19Updated 7 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆41Updated last month
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆198Updated 5 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆90Updated 5 months ago
- ☆200Updated last month
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆39Updated 3 weeks ago
- small audio language model for reasoning☆64Updated last month
- ☆36Updated last week
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆19Updated last year
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆128Updated 10 months ago
- ☆78Updated 7 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆79Updated 6 months ago