k-farruh / speech-accent-detectionLinks
The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.
☆62Updated 3 years ago
Alternatives and similar repositories for speech-accent-detection
Users that are interested in speech-accent-detection are comparing it to the libraries listed below
Sorting:
- PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…☆48Updated 2 years ago
- Goodness of Pronunciation using Kaldi on Epa-DB database☆35Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆118Updated 2 years ago
- ☆68Updated 11 months ago
- A non-native English corpus for pronunciation scoring task☆147Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆53Updated 3 months ago
- Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!☆173Updated last week
- Collection of pretrained models for the Montreal Forced Aligner☆161Updated 2 months ago
- ☆68Updated 2 months ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆107Updated 6 months ago
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.☆54Updated 2 years ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆66Updated 4 years ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆148Updated last year
- A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.☆101Updated 2 years ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆88Updated last year
- PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised T…☆194Updated 2 years ago
- phoneme tokenizer and grapheme-to-phoneme model for 8k languages☆169Updated 2 years ago
- A python library for voice activity detection (VAD) for speech/non-speech segmentation.☆89Updated 2 years ago
- Building a Deep learning model that predicts the gender of a speaker using TensorFlow 2☆127Updated 2 years ago
- A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.☆91Updated 4 months ago
- A curated list of awesome voice activity detection☆62Updated 9 months ago
- Putting flows on top of neural transducers for better TTS☆63Updated 3 weeks ago
- The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems☆271Updated last year
- 😎 Awesome lists about Speech Emotion Recognition☆96Updated 8 months ago
- Toolbox for easy and qualitative one-shot voice conversion☆46Updated 3 years ago
- Neural HMMs are all you need (for high-quality attention-free TTS)☆159Updated 3 weeks ago
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆150Updated last year
- VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram☆254Updated last year
- A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project g…☆146Updated 3 years ago
- Add n-gram and large language model (LLM) support to Whisper models.☆31Updated 3 months ago