light1726/Speech-Tokenization-Papers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/light1726/Speech-Tokenization-Papers)

light1726 / Speech-Tokenization-Papers

This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.

☆15

Alternatives and similar repositories for Speech-Tokenization-Papers

Users that are interested in Speech-Tokenization-Papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LingweiMeng / QualifyingExamPreparing
View on GitHub
Qualifying Exam Preparing
☆18May 7, 2025Updated last year
Shy-98 / MELLE
View on GitHub
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆41Jun 28, 2025Updated last year
LingweiMeng / MyChatGPT
View on GitHub
A casual and simple ChatGPT Python script that can run using terminal (as long as you have an API). Support Azure API.
☆20May 3, 2025Updated last year
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
chenxie95 / deeplearning_course_sjtu
View on GitHub
☆18Apr 8, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
pengzhendong / streaming-tts-webui
View on GitHub
Streaming Text to Speech Web UI
☆22May 6, 2024Updated 2 years ago
e-c-k-e-r / vall-e
View on GitHub
An unofficial PyTorch implementation of VALL-E
☆88Aug 3, 2025Updated 11 months ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
jacquelineCelia / lexicon_discovery
View on GitHub
Source code for "Unsupervised Lexicon Discovery from Acoustic Input ", Lee et al, 2015 TACL
☆10Aug 11, 2016Updated 9 years ago
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 5 months ago
HuangZiliAndy / SSL_for_multitalker
View on GitHub
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
☆33Mar 16, 2023Updated 3 years ago
jiamingkong / slam_asr_pytorch
View on GitHub
Re-implementation of SLAM-ASR paper's experiment, using Phi-2 and Hubert
☆22Jun 14, 2024Updated 2 years ago
Aisaka0v0 / TS-Whisper
View on GitHub
☆33Jun 12, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bfs18 / e2_tts
View on GitHub
☆69Sep 3, 2024Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
tarepan / SpeechMOS
View on GitHub
Easy-to-Use Speech MOS predictors
☆360Oct 24, 2023Updated 2 years ago
ftshijt / Interspeech2024_DiscreteSpeechChallenge
View on GitHub
This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.
☆32Jan 26, 2024Updated 2 years ago
lucasondel / amdtk
View on GitHub
☆12Feb 26, 2018Updated 8 years ago
SWivid / AUV
View on GitHub
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
☆28Oct 11, 2025Updated 9 months ago
IDEA-Emdoor-Lab / DistilCodec
View on GitHub
A Neural Audio Codec (NAC) for Universal Audio
☆47May 30, 2025Updated last year
youngsheen / GPST
View on GitHub
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆70Nov 1, 2024Updated last year
PlayVoice / VI-SVC
View on GitHub
VI-SVC model is just VITS without MAS and DurationPredictor.
☆10Nov 9, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
bovod-sjtu / HoliTok
View on GitHub
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
☆39Jun 8, 2026Updated last month
cghezhang / AdamW
View on GitHub
https://arxiv.org/abs/1711.05101
☆17Jun 21, 2018Updated 8 years ago
AmphionTeam / Emilia-NV
View on GitHub
Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"
☆91Sep 18, 2025Updated 10 months ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
zehanwang01 / FreeBind
View on GitHub
☆22Apr 22, 2025Updated last year
AI-S2-Lab / GPT-Talker
View on GitHub
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
☆45Oct 28, 2024Updated last year
Ruiqi-Yan / URO-Bench
View on GitHub
Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
☆55Sep 2, 2025Updated 10 months ago
ftshijt / speech_evaluation
View on GitHub
A toolkit dedicate for speech evaluation.
☆23Sep 26, 2024Updated last year
cuhealthybrains / MT-LLM
View on GitHub
The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
☆51Apr 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ddlBoJack / Awesome-Speech-Pretraining
View on GitHub
Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.
☆212Jan 18, 2024Updated 2 years ago
zjlww / papers
View on GitHub
Connected Papers knockoff, managing academic papers and citations with graph database.
☆12Dec 26, 2023Updated 2 years ago
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
qiuqiangkong / materials_for_students
View on GitHub
☆16Aug 10, 2025Updated 11 months ago
cnaigithub / SpeechDewarping
View on GitHub
Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023
☆27Apr 27, 2023Updated 3 years ago
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
jishengpeng / Languagecodec
View on GitHub
[ACL 2025 Oral] Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
☆208Jun 25, 2025Updated last year