speechsuper / SpeechSuper-API-SamplesLinks
Deep learning based speech and pronunciation assessment API for 8 languages.
☆43Updated last year
Alternatives and similar repositories for SpeechSuper-API-Samples
Users that are interested in SpeechSuper-API-Samples are comparing it to the libraries listed below
Sorting:
- ☆38Updated last year
- A non-native English corpus for pronunciation scoring task☆143Updated 11 months ago
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…☆232Updated 3 years ago
- Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".☆177Updated 2 years ago
- ☆15Updated 2 months ago
- Collection of pretrained models for the Montreal Forced Aligner☆155Updated last week
- ONNX Inference of Pyannote Segmentation☆91Updated 6 months ago
- Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)☆120Updated 3 years ago
- PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean,…☆302Updated 3 years ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- The Code shows How to Transcribe Audio to text using the fairseq_meta_mms (Google Colab Version)👇☆18Updated 2 years ago
- Charsiu: A neural phonetic aligner.☆305Updated 2 years ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆68Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆115Updated 2 years ago
- 😎 Awesome lists about Speech Emotion Recognition☆91Updated 6 months ago
- Official Implementation of StyleTTS-VC☆184Updated 5 months ago
- Monotonic Alignment Search☆94Updated 2 weeks ago
- PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised T…☆194Updated 2 years ago
- A public domain single speaker Japanese speech dataset☆55Updated last year
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆24Updated last month
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.☆80Updated 7 months ago
- finetune llm part for spark-tts model☆85Updated 3 months ago
- Multilingual G2P in 100 languages☆331Updated 2 years ago
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆312Updated 2 years ago
- Goodness of Pronunciation (GOP) for oral reading assessment.☆52Updated 3 years ago
- Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions☆258Updated 5 months ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3☆202Updated last year
- Fine-Tune Whisper with Transformers and PEFT☆57Updated last year
- Timething is a library for aligning text transcripts with their audio recordings.☆121Updated 6 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆84Updated last year