☆17Aug 27, 2025Updated 6 months ago
Alternatives and similar repositories for speech
Users that are interested in speech are comparing it to the libraries listed below
Sorting:
- Korean ASR Corpus generated from TEDx talks☆27Jan 11, 2019Updated 7 years ago
- Implementation of the Rhythm Formant Analysis methodology for identifying speech rhythms and rhythm variation in the low frequency spectr…☆17Apr 27, 2023Updated 2 years ago
- ☆15May 8, 2021Updated 4 years ago
- PyTorch implementation of Retriever: Learning Content-Style Representation☆12Jan 27, 2023Updated 3 years ago
- This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge☆15Mar 26, 2022Updated 3 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- **ICASSP 2022** 《Toward Degradation-Robust Voice Conversion》Using speech enhancement and end-to-end denoising training to improve degrada…☆24Sep 27, 2022Updated 3 years ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆80Sep 27, 2023Updated 2 years ago
- A CSRankings-like index for speech researchers☆35Oct 16, 2024Updated last year
- Simple tool for speech dataset augmentation for modeling various prosodies.☆14Jan 14, 2021Updated 5 years ago
- A pakage for crawling audio from Youtube☆42Aug 8, 2023Updated 2 years ago
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.☆16Jul 22, 2021Updated 4 years ago
- Official implementation of BVAE-TTS☆173Sep 26, 2022Updated 3 years ago
- Lightweight speaker anonymization [IEEE SLT2021]☆27Jun 6, 2022Updated 3 years ago
- Implementation of the paper "BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition"☆17Dec 10, 2020Updated 5 years ago
- ☆70Jan 7, 2021Updated 5 years ago
- A neural language modeling toolkit built on PyTorch☆19Mar 17, 2023Updated 2 years ago
- ☆17Apr 14, 2023Updated 2 years ago
- Torch implementation of NANSY, Neural Analysis and Synthesis, arXiv:2110.14513☆64Feb 13, 2023Updated 3 years ago
- Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly es…☆19Jun 14, 2021Updated 4 years ago
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 3 years ago
- ☆16Jun 13, 2022Updated 3 years ago
- Google's TPGST reimplementation.☆34Dec 11, 2019Updated 6 years ago
- A duration-invariant audio-to-lyrics alignment pipeline with low memory footprint which segments long music recordings via a recursive bi…☆15Oct 13, 2022Updated 3 years ago
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 7 months ago
- Implementation of the AlignTTS☆77Jul 6, 2023Updated 2 years ago
- Visualizing the Music Transformer attention☆26Nov 15, 2019Updated 6 years ago
- End-to-end Text-to-Speech with Generative Adversarial Networks☆20Feb 6, 2021Updated 5 years ago
- Jejueo Datasets for Machine Translation and Speech Synthesis☆83Feb 19, 2020Updated 6 years ago
- Code for paper A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing☆89Sep 6, 2024Updated last year
- ☆24Mar 13, 2020Updated 5 years ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/☆34Mar 17, 2023Updated 2 years ago
- Baseline Recipe for VoicePrivacy Challenge 2020: https://www.voiceprivacychallenge.org/vp2020/docs/VoicePrivacy_2020_Eval_Plan_v1_3.pdf☆64Jul 6, 2023Updated 2 years ago
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated 2 years ago
- Temporary anonymous version☆22Mar 20, 2024Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Emotion detection in audio utilising self-supervised representations trained with Contrastive Predictive Coding (CPC).☆43Feb 16, 2022Updated 4 years ago