Survey on speech generation work.
☆21Nov 26, 2023Updated 2 years ago
Alternatives and similar repositories for Awesome-Speech-Generation
Users that are interested in Awesome-Speech-Generation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Aug 29, 2023Updated 2 years ago
- A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022☆38Nov 30, 2023Updated 2 years ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.☆111Aug 4, 2023Updated 2 years ago
- Temporary anonymous version☆22Mar 20, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The open source code for LLM-Codec☆146Aug 18, 2024Updated last year
- ☆16Apr 4, 2022Updated 4 years ago
- ☆30Jul 21, 2022Updated 3 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- ☆39Apr 15, 2024Updated 2 years ago
- Reference-aware automatic speech evaluation toolkit☆181Dec 5, 2024Updated last year
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- ☆13Sep 25, 2024Updated last year
- Audio Codec Speech processing Universal PERformance Benchmark☆301Apr 1, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆20Jul 27, 2024Updated last year
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆50Apr 7, 2025Updated last year
- ☆18Sep 22, 2025Updated 6 months ago
- ☆24Jun 13, 2022Updated 3 years ago
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)☆15Feb 28, 2026Updated last month
- Source code of APNet2, a vocoder☆58Nov 23, 2023Updated 2 years ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆61Jun 7, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech und…☆42Mar 12, 2023Updated 3 years ago
- text to speech☆10Mar 19, 2024Updated 2 years ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Jun 2, 2023Updated 2 years ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆248Mar 7, 2025Updated last year
- ☆15Nov 10, 2025Updated 5 months ago
- Keep track of big models in audio domain, including speech, singing, music etc.☆508Sep 26, 2024Updated last year
- Awesome speech/audio LLMs, representation learning, and codec models☆1,219Apr 4, 2026Updated 2 weeks ago
- proof of concept conversation orchestrator with a speech-language model☆20Oct 19, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆13Sep 12, 2024Updated last year
- Official implementation of MelHuBERT☆70Feb 21, 2026Updated last month
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆75Aug 24, 2024Updated last year
- Official Implementation of LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models.☆34Nov 9, 2025Updated 5 months ago
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding☆23Dec 17, 2025Updated 4 months ago
- This repository includes the code to reproduce our paper "End-to-end anti-spoofing with RawNet2" (https://arxiv.org/abs/2011.01108) publi…☆67Aug 8, 2023Updated 2 years ago