k-farruh / speech-accent-detectionLinks
The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.
☆62Updated 3 years ago
Alternatives and similar repositories for speech-accent-detection
Users that are interested in speech-accent-detection are comparing it to the libraries listed below
Sorting:
- PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…☆48Updated 2 years ago
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆119Updated 2 years ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆107Updated last month
- A non-native English corpus for pronunciation scoring task☆157Updated this week
- Goodness of Pronunciation using Kaldi on Epa-DB database☆35Updated last year
- ☆67Updated 4 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆55Updated 5 months ago
- Collection of pretrained models for the Montreal Forced Aligner☆172Updated 3 weeks ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆151Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆90Updated 2 years ago
- A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.☆107Updated 2 years ago
- phoneme tokenizer and grapheme-to-phoneme model for 8k languages☆170Updated 2 years ago
- Various speech datasets made available to the public☆131Updated 10 months ago
- Phoneme segmentation using pre-trained speech models☆55Updated 2 years ago
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆151Updated last year
- ☆69Updated last year
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.☆53Updated 2 years ago
- Fine-Tune Whisper with Transformers and PEFT☆57Updated last year
- Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".☆187Updated 2 years ago
- A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.☆90Updated 6 months ago
- Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together☆47Updated 3 months ago
- Timething is a library for aligning text transcripts with their audio recordings.☆124Updated 10 months ago
- Goodness of Pronunciation (GOP) for oral reading assessment.☆52Updated 3 years ago
- Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!☆177Updated this week
- ☆27Updated 4 years ago
- Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions☆262Updated 9 months ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆66Updated 4 years ago
- Multilingual G2P in 100 languages☆361Updated 2 years ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆65Updated 3 years ago
- An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.☆81Updated 2 years ago