jindongli-Ai/LLM-Discrete-Tokenization-Survey

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jindongli-Ai/LLM-Discrete-Tokenization-Survey)

jindongli-Ai / LLM-Discrete-Tokenization-Survey

[TPAMI 2026] Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey.

☆83

Alternatives and similar repositories for LLM-Discrete-Tokenization-Survey

Users that are interested in LLM-Discrete-Tokenization-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RayYuki / CodecBench
View on GitHub
☆24Nov 16, 2025Updated 8 months ago
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
primepake / F5-TTS-meanflow-multilingual
View on GitHub
Meanflow and multilingual for F5-TTS model
☆16Aug 23, 2025Updated 10 months ago
Neur-IO / ReVQ
View on GitHub
Explore how to get a VQ-VAE models efficiently!
☆69Jul 24, 2025Updated 11 months ago
XiaomiMiMo / MiMo-Audio-Training
View on GitHub
☆109Oct 16, 2025Updated 9 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
TIGER-AI-Lab / LLM-AMT
View on GitHub
This repository contains the code for our paper "Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering" [EMNLP…
☆14Oct 8, 2024Updated last year
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
SxJyJay / UniToken
View on GitHub
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…
☆106Apr 23, 2025Updated last year
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
John-Ge / Awesome-Native-Multimodal-Models
View on GitHub
☆35Apr 9, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yongaifadian1 / MNV-17
View on GitHub
Qwen2.5-Omni fine-tuned on MNV-17 dataset for nonverbal vocalization recognition
☆31Nov 13, 2025Updated 8 months ago
ryuclc / CosyVoice2-GRPO
View on GitHub
A simple implementation for improving CosyVoice2 by GRPO method
☆39May 5, 2026Updated 2 months ago
InternLM / StarBench
View on GitHub
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
☆42Apr 19, 2026Updated 3 months ago
ETH-DISCO / discoder
View on GitHub
Official repo for DisCoder: High-Fidelity Music Vocoder using Neural Audio Codecs presented at ICASSP 2025
☆42Feb 24, 2025Updated last year
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
hdtkk / HDT
View on GitHub
[AAAI 2025] Official Implementation of "HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting"
☆19Feb 17, 2025Updated last year
xingchensong / CosyVoice-ttsfrd
View on GitHub
☆25Jun 19, 2025Updated last year
OpenMOSS / MOSS-Audio-Tokenizer
View on GitHub
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…
☆245Jun 16, 2026Updated last month
channel-io / ch-tts-llasa-rl-grpo
View on GitHub
☆50Apr 20, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
Jiang-Yidi / UniCodec
View on GitHub
[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…
☆156May 30, 2025Updated last year
ATH-MaaS / Awesome-Unified-Multimodal-Models
View on GitHub
Awesome Unified Multimodal Models
☆1,302Mar 24, 2026Updated 3 months ago
MoonshotAI / Kimi-Audio-Evalkit
View on GitHub
☆168Nov 20, 2025Updated 8 months ago
youngsheen / SimVQ
View on GitHub
[ICCV 2025] SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
☆327Dec 29, 2024Updated last year
Yibin-Lei / MetaEOL
View on GitHub
Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"
☆12Jul 25, 2024Updated last year
AbrahamSanders / codec-bpe
View on GitHub
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
☆76Dec 3, 2025Updated 7 months ago
timedomain-tech / ACE_phonemes
View on GitHub
a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine
☆44Jan 17, 2025Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆120Jun 21, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
mispchallenge / MISP-ICME-AVSR
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 5 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆83Apr 7, 2026Updated 3 months ago
LqNoob / Neural-Codec-and-Speech-Language-Models
View on GitHub
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
☆246Jul 9, 2026Updated last week
bigai-nlco / UltraVoice
View on GitHub
Official Repository of UltraVoice
☆62Oct 28, 2025Updated 8 months ago
exercise-book-yq / Supercodec
View on GitHub
☆51Mar 5, 2026Updated 4 months ago