kgnlp / allophant
A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.
☆20Updated last week
Alternatives and similar repositories for allophant:
Users that are interested in allophant are comparing it to the libraries listed below
- ☆21Updated 6 months ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆32Updated last year
- Collection of scripts from mHuBERT-147.☆24Updated 4 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆21Updated 6 months ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated last year
- C++ version of pyannote audio overlapped speech detection pipeline☆12Updated last year
- ☆12Updated last month
- 56 language, 1 model Multilingual ASR☆25Updated 3 years ago
- Prosodic Speech Segmentation with Transformers☆25Updated last year
- ☆18Updated 10 months ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆25Updated 7 months ago
- phone inventory library☆16Updated last year
- ☆10Updated 3 weeks ago
- A handy dataset of noises for ASR☆20Updated 5 years ago
- A Python-based modular toolbox for building Deep Neural Network models (using PyTorch) for statistical parametric speech synthesis☆23Updated 3 years ago
- ☆13Updated 7 months ago
- Implementation of the paper "BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition"☆17Updated 4 years ago
- Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.☆26Updated 7 months ago
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆15Updated 5 months ago
- ☆36Updated 6 months ago
- ☆17Updated 3 years ago
- ☆9Updated 5 years ago
- This repo related to the paper "A Framework for Phoneme-Level Pronunciation Assessment Using CTC" for INTERSPEECH2024☆18Updated 4 months ago
- asr2k☆49Updated 9 months ago
- Forced alignment decoder for Whisper.☆14Updated last year
- Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly es…☆19Updated 3 years ago
- ☆22Updated last month
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated last year
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆50Updated 2 weeks ago