CUNY-CL/wikipron

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CUNY-CL/wikipron)

CUNY-CL / wikipron

Massively multilingual pronunciation mining

☆371

Alternatives and similar repositories for wikipron

Users that are interested in wikipron are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dmort27 / epitran
View on GitHub
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
☆827Jun 18, 2026Updated last month
uiuc-sst / g2ps
View on GitHub
Data and code for grapheme-to-phoneme transducers in lots of languages
☆152Apr 5, 2024Updated 2 years ago
axelspringer / DeepPhonemizer
View on GitHub
Grapheme to phoneme conversion with deep learning.
☆432Dec 8, 2023Updated 2 years ago
xinjli / transphone
View on GitHub
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
☆174Jun 9, 2023Updated 3 years ago
sigmorphon / 2020
View on GitHub
SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…
☆36Apr 25, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
lumaku / ctc-segmentation
View on GitHub
Segment an audio file and obtain utterance alignments. (Python package)
☆348May 15, 2024Updated 2 years ago
xinjli / phonepiece
View on GitHub
phone inventory library
☆17May 15, 2023Updated 3 years ago
AdolfVonKleist / Phonetisaurus
View on GitHub
Phonetisaurus G2P
☆517Jun 1, 2024Updated 2 years ago
lingjzhu / CharsiuG2P
View on GitHub
Multilingual G2P in 100 languages
☆390May 26, 2023Updated 3 years ago
NRC-ILT / g2p
View on GitHub
Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
☆203Updated this week
google-research-datasets / WikipediaHomographData
View on GitHub
Labeled data for homograph disambiguation
☆62Jun 1, 2023Updated 3 years ago
CUNY-CL / wikipron-modeling
View on GitHub
Proposed splits for the LREC Wikipron paper
☆15Apr 7, 2020Updated 6 years ago
lingjzhu / charsiu
View on GitHub
Charsiu: A neural phonetic aligner.
☆347Sep 19, 2022Updated 3 years ago
bootphon / phonemizer
View on GitHub
Simple text to phones converter for multiple languages
☆1,558Sep 26, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Kyubyong / g2p
View on GitHub
g2p: English Grapheme To Phoneme Conversion
☆927Jan 5, 2023Updated 3 years ago
xinjli / allosaurus
View on GitHub
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
☆737Apr 26, 2024Updated 2 years ago
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
dmort27 / panphon
View on GitHub
Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
☆318Oct 22, 2025Updated 9 months ago
JRMeyer / common-voice-forced-alignments
View on GitHub
Forced Alignments for Common Voice
☆33Oct 30, 2020Updated 5 years ago
open-dict-data / ipa-dict
View on GitHub
Monolingual wordlists with pronunciation information in IPA
☆786May 24, 2025Updated last year
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
xinjli / alqalign
View on GitHub
multilingual speech aligner
☆78Nov 19, 2023Updated 2 years ago
xinjli / ucla-phonetic-corpus
View on GitHub
Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
☆46May 12, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
pavelsof / ipatok
View on GitHub
IPA tokeniser
☆19Jul 28, 2025Updated 11 months ago
neosapience / editts
View on GitHub
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
☆122Jan 24, 2023Updated 3 years ago
amazon-science / proteno
View on GitHub
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…
☆45May 25, 2021Updated 5 years ago
sequitur-g2p / sequitur-g2p
View on GitHub
This is a github repository of the abandonware Sequitur G2P by Bisani & Ney
☆174Dec 16, 2025Updated 7 months ago
lingjzhu / zipa
View on GitHub
A family of efficient speech models for multilingual phone recognition
☆68Updated this week
NVIDIA / radtts
View on GitHub
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, …
☆291Apr 6, 2023Updated 3 years ago
google-research / nisaba
View on GitHub
Finite-state script normalization and processing utilities
☆52Jun 24, 2026Updated last month
kirbyj / praatsauce
View on GitHub
Praat-based tools for spectral analysis
☆37May 28, 2026Updated last month
FastTrackiverse / fasttrackpy
View on GitHub
A fasttrack implementation in python
☆13Feb 10, 2026Updated 5 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
rishikksh20 / Avocodo-pytorch
View on GitHub
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
☆122Jul 14, 2022Updated 4 years ago
coqui-ai / open-speech-corpora
View on GitHub
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
☆1,397Jun 6, 2024Updated 2 years ago
MiniXC / LightningFastSpeech2
View on GitHub
☆55Jan 13, 2023Updated 3 years ago
kylebgorman / pynini
View on GitHub
Read-only mirror of Pynini
☆170Sep 4, 2025Updated 10 months ago
LAION-AI / Text-to-speech
View on GitHub
☆61Nov 4, 2023Updated 2 years ago
festvox / datasets-CMU_Wilderness
View on GitHub
CMU Wilderness Multilingual Speech Dataset
☆292Apr 20, 2019Updated 7 years ago
rhasspy / gruut
View on GitHub
A tokenizer, text cleaner, and phonemizer for many human languages.
☆330Nov 15, 2024Updated last year