cmeraki / audiotoken

Audio tokenization, in the fastest way possible!

☆45

Related projects ⓘ

Alternatives and complementary repositories for audiotoken

utter-project / mHuBERT-147-scripts
Collection of scripts from mHuBERT-147.
☆22Updated 4 months ago
NeuralVox / OpenPhonemizer
An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…
☆83Updated last month
ryanrudes / YTTTS
The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions
☆50Updated 3 years ago
parrot-tts / Parrot-TTS
Official Code for ParrotTTS
☆41Updated 3 weeks ago
LAION-AI / Text-to-speech
☆59Updated last year
ex3ndr / supervoice-gpt-facodec
GPT for FACodec
☆13Updated 7 months ago
ex3ndr / supervoice-gpt
GPT-style network for phonemization with durations of text
☆62Updated 7 months ago
Edresson / ZS-TTS-Evaluation
☆32Updated last month
ex3ndr / supervoice-enhance
Supervoice diffusion enhance
☆25Updated 3 months ago
apple / pytorch-speech-features
☆84Updated 7 months ago
utter-project / fairseq
This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.
☆12Updated 5 months ago
kyegomez / USM
Implementation of Google's USM speech model in Pytorch
☆25Updated this week
X-E-Speech / X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
☆65Updated 7 months ago
speechnovateur / languagecodec_tmp
Temporary anonymous version
☆22Updated 7 months ago
ashi-ta / speechGLUE
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Updated last year
xincanfeng / vitsGPT
☆43Updated 4 months ago
warisqr007 / ppg2ppg
Zero-Shot Foreign Accent Conversion without a Native Reference
☆28Updated 6 months ago
AbrahamSanders / codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
☆46Updated last month
kehanlu / DeSTA2
☆28Updated this week
MiniXC / LightningFastSpeech2
☆56Updated last year
nivibilla / efficient-vits-finetuning
Finetuning VITS Efficiently
☆32Updated last year
VoiceBank-NTPU-TW / VoiceBank-2023
VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.
☆36Updated last year
DanielLin94144 / StyleTalk
Official release of StyleTalk dataset.
☆57Updated 4 months ago
NVIDIA / RAD-MMM
A TTS model that makes a speaker speak new languages
☆75Updated 4 months ago
egorsmkv / asr-corpus-creator
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
☆27Updated 8 months ago
freds0 / CML-TTS-Dataset
CML-TTS: A Multilingual Dataset for Speech Synthesis
☆29Updated 3 months ago
sanchit-gandhi / whisper-flash-attention
☆19Updated last year
yzGuu830 / efficient-speech-codec
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆90Updated 2 weeks ago
DakeQQ / F5-TTS-ONNX
Running the F5-TTS by ONNX Runtime
☆25Updated this week
AlanBaade / SyllableLM
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆35Updated 3 weeks ago