MiscellaneousStuff/PhoneLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MiscellaneousStuff/PhoneLM)

MiscellaneousStuff / PhoneLM

(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.

☆48

Alternatives and similar repositories for PhoneLM

Users that are interested in PhoneLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ex3ndr / supervoice-gpt
View on GitHub
GPT-style network for phonemization with durations of text
☆68Mar 21, 2024Updated 2 years ago
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
lakahaga / dc-comix-tts
View on GitHub
Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
☆74Aug 21, 2023Updated 2 years ago
y-chan / hifi-gan-misrnet
View on GitHub
unofficial pytorch implementation of HiFi-GAN with fast MISR.
☆15Mar 21, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ZhangXInFD / soundstorm-speechtokenizer
View on GitHub
Implementation of SoundStorm built upon SpeechTokenizer.
☆116Nov 2, 2023Updated 2 years ago
youngsheen / GPST
View on GitHub
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆70Nov 1, 2024Updated last year
anton-kashkin / hifi_vc
View on GitHub
☆25Jan 24, 2023Updated 3 years ago
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
CODEJIN / XiaoiceSing2
View on GitHub
☆19Feb 2, 2023Updated 3 years ago
sony / bigvsan
View on GitHub
Pytorch implementation of BigVSAN
☆203Dec 9, 2025Updated 7 months ago
seastar105 / pflow-encodec
View on GitHub
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
☆77Jul 13, 2026Updated last week
ndkgit339 / spe-dss
View on GitHub
Speech Parameter Estimation Using Differentiable Speech Synthesizer
☆43May 9, 2023Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
voidful / vall-e-encodec
View on GitHub
☆41May 15, 2023Updated 3 years ago
e-c-k-e-r / vall-e
View on GitHub
An unofficial PyTorch implementation of VALL-E
☆88Aug 3, 2025Updated 11 months ago
shang0712 / HierTTS
View on GitHub
☆47Apr 16, 2023Updated 3 years ago
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
CODEJIN / NaturalSpeech2
View on GitHub
☆139Jan 7, 2024Updated 2 years ago
0nutation / SLMTokBench
View on GitHub
SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
☆37Aug 29, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
lifeiteng / SoundStorm
View on GitHub
☆71Jul 13, 2023Updated 3 years ago
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
cnaigithub / SpeechDewarping
View on GitHub
Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023
☆27Apr 27, 2023Updated 3 years ago
X-E-Speech / X-E-Speech-code
View on GitHub
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
☆112Apr 1, 2024Updated 2 years ago
choiHkk / Transformer-TTS-V2
View on GitHub
☆25Mar 6, 2024Updated 2 years ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
sushant-t / tts-trainer
View on GitHub
Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…
☆30May 27, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
revsic / torch-nansy
View on GitHub
Torch implementation of NANSY, Neural Analysis and Synthesis, arXiv:2110.14513
☆64Feb 13, 2023Updated 3 years ago
Infinity-INF / fast-phasr
View on GitHub
Phonemes and durations labeling based on whisper small
☆11Jul 7, 2024Updated 2 years ago
WelkinYang / WaveODE
View on GitHub
An ODE-based generative neural vocoder using Rectified Flow
☆58Apr 29, 2023Updated 3 years ago
nervjack2 / MelHuBERT
View on GitHub
Official implementation of MelHuBERT
☆70Feb 21, 2026Updated 4 months ago
ajinkyakulkarni14 / ERISHA
View on GitHub
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…
☆44Dec 17, 2020Updated 5 years ago
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
0913ktg / SC_VALL-E
View on GitHub
Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E
☆136Oct 23, 2024Updated last year