thuhcsi / dpss-exp3-VC-BNFLinks
Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>
☆17Updated 9 months ago
Alternatives and similar repositories for dpss-exp3-VC-BNF
Users that are interested in dpss-exp3-VC-BNF are comparing it to the libraries listed below
Sorting:
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆29Updated 5 months ago
- ☆134Updated 2 years ago
- STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation☆42Updated last month
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆60Updated 7 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆47Updated 2 months ago
- Official Repository of Paper: "NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizat…☆40Updated 3 weeks ago
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆50Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆53Updated 2 months ago
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆127Updated last year
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆57Updated 6 months ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆44Updated last year
- Chinese-Mimi 是对 Moshi 模型的声码器进行了中文语料上的适配。☆29Updated 5 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆51Updated 5 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆106Updated 3 months ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆72Updated 3 months ago
- ☆37Updated 5 months ago
- ☆42Updated 7 months ago
- The demo page for ALMTokenizer☆53Updated 4 months ago
- Source code for DM-Codec.☆48Updated 3 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆26Updated 11 months ago
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆34Updated 2 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆60Updated 10 months ago
- Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆165Updated 2 months ago
- ☆19Updated 11 months ago
- This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…☆15Updated last year
- Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"☆36Updated 2 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆38Updated last year
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆140Updated 3 months ago
- [ICCV 2025] SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer☆287Updated 8 months ago
- ☆53Updated last month