speechsuper / SpeechSuper-API-SamplesLinks
Deep learning based speech and pronunciation assessment API for 8 languages.
☆42Updated last year
Alternatives and similar repositories for SpeechSuper-API-Samples
Users that are interested in SpeechSuper-API-Samples are comparing it to the libraries listed below
Sorting:
- ☆38Updated last year
- ☆15Updated last month
- Goodness of Pronunciation (GOP) for oral reading assessment.☆52Updated 3 years ago
- Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".☆173Updated 2 years ago
- A non-native English corpus for pronunciation scoring task☆141Updated 10 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆98Updated 7 months ago
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆114Updated 2 years ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆67Updated last year
- Spoken Language assessment☆43Updated 4 years ago
- Application of MB-iSTFT-VITS components to vits2_pytorch☆126Updated 6 months ago
- finetune llm part for spark-tts model☆79Updated 2 months ago
- ONNX Inference of Pyannote Segmentation☆90Updated 5 months ago
- Voice gender classifier using ECAPA-TDNN☆43Updated 4 months ago
- Fine-Tune Whisper with Transformers and PEFT☆57Updated last year
- ☆57Updated 11 months ago
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆149Updated last year
- ☆25Updated 2 years ago
- Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)☆119Updated 2 years ago
- Universal multilingual automatic speech transcription into IPA☆65Updated 3 months ago
- Timething is a library for aligning text transcripts with their audio recordings.☆119Updated 6 months ago
- ☆79Updated last year
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆81Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆83Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆94Updated last year
- a curated list of speech datasets (110+ datasets, 75+ easy to download)☆132Updated 2 years ago
- Python forced alignment☆89Updated last year
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆139Updated 7 months ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆254Updated last year
- VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.☆36Updated 2 years ago