speechsuper / SpeechSuper-API-Samples
Deep learning based speech and pronunciation assessment API for 8 languages.
☆30Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for SpeechSuper-API-Samples
- A non-native English corpus for pronunciation scoring task☆112Updated 4 months ago
- Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".☆152Updated last year
- ☆10Updated 2 months ago
- Barkify: an unoffical training implementation of Bark TTS by suno-ai☆126Updated last year
- Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event …☆323Updated 9 months ago
- A testing repo to share code and thoughts on diarisation☆53Updated 7 months ago
- ONNX Inference of Pyannote Segmentation☆66Updated 2 months ago
- ☆254Updated 8 months ago
- Awesome TTS☆54Updated 3 years ago
- This tool uses AI to evaluate your pronunciation.☆153Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆68Updated 6 months ago
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆259Updated last year
- Text to speech alignment using CTC forced alignment☆141Updated 3 weeks ago
- ☆20Updated 7 months ago
- Application of MB-iSTFT-VITS components to vits2_pytorch☆117Updated this week
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆289Updated this week
- The Code shows How to Transcribe Audio to text using the fairseq_meta_mms (Google Colab Version)👇☆18Updated last year
- Official Implementation of StyleTTS☆401Updated 11 months ago
- Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023☆202Updated last year
- Google's SoundStorm: Efficient Parallel Audio Generation☆129Updated last year
- 🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.☆209Updated 5 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆84Updated 6 months ago
- VALL-E 2 reproduction☆87Updated 4 months ago
- The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines …☆54Updated 3 years ago
- FastAPI service on top of WhisperX☆41Updated this week
- Timething is a library for aligning text transcripts with their audio recordings.☆103Updated last year
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆54Updated 5 months ago
- Split long audio files based on subtitle-info in SRT File (Transcript saved in CSV)☆18Updated 5 years ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆66Updated last year
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…☆209Updated 2 years ago