lucidrains/rvq-vae-gpt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/rvq-vae-gpt)

lucidrains / rvq-vae-gpt

My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation

☆90

Alternatives and similar repositories for rvq-vae-gpt

Users that are interested in rvq-vae-gpt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cnaigithub / SpeechDewarping
View on GitHub
Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023
☆27Apr 27, 2023Updated 3 years ago
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
mct10 / RepCodec
View on GitHub
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
☆196Jul 12, 2024Updated 2 years ago
chomeyama / HN-UnifiedSourceFilterGAN
View on GitHub
☆88Nov 1, 2022Updated 3 years ago
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
mubtasimahasan / DM-Codec
View on GitHub
Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”
☆57Jun 1, 2025Updated last year
ex3ndr / supervoice-gpt
View on GitHub
GPT-style network for phonemization with durations of text
☆68Mar 21, 2024Updated 2 years ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
maum-ai / phaseaug
View on GitHub
ICASSP 2023 Accepted
☆191May 6, 2024Updated 2 years ago
thuhcsi / SnakeGAN
View on GitHub
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Apr 25, 2023Updated 3 years ago
scutcsq / Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
View on GitHub
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…
☆60Apr 4, 2024Updated 2 years ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
Chillee / lit-llama
View on GitHub
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
☆10Aug 29, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
zzw922cn / wesinger2
View on GitHub
Synthesized singing voice demos of WeSinger 2 paper.
☆26Feb 20, 2023Updated 3 years ago
PlayVoice / BigVGAN
View on GitHub
BigVGAN with Neural Source-Filter
☆58Sep 21, 2023Updated 2 years ago
haiciyang / LaDiffCodec
View on GitHub
ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.
☆56Nov 16, 2025Updated 8 months ago
innnky / descript-audio-vae
View on GitHub
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
☆92Apr 2, 2024Updated 2 years ago
reppy4620 / vocoders
View on GitHub
My vocoder experiments
☆31Jul 26, 2025Updated 11 months ago
chomeyama / SiFiGAN
View on GitHub
Official implementation of the source-filter HiFiGAN vocoder
☆275Jul 29, 2023Updated 2 years ago
AbrahamSanders / codec-bpe
View on GitHub
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
☆76Dec 3, 2025Updated 7 months ago
madhavlab / wav2tok
View on GitHub
Codebase for ICLR' 23 paper- ''wav2tok: Deep Sequence Tokenizer for Audio Retrieval"
☆36Jun 30, 2026Updated 3 weeks ago
jishengpeng / Languagecodec
View on GitHub
[ACL 2025 Oral] Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
☆208Jun 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆92Jun 18, 2024Updated 2 years ago
samsad35 / source-filter-vae
View on GitHub
[SpeechCom Journal] Learning and controlling the source-filter representation of speech with a variational autoencoder
☆46Apr 18, 2023Updated 3 years ago
lucasnewman / best-rq-pytorch
View on GitHub
Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.
☆135Sep 25, 2023Updated 2 years ago
roatienza / efficientspeech
View on GitHub
PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.
☆182Mar 18, 2024Updated 2 years ago
TideDancer / iclr22-wctc
View on GitHub
☆15Mar 15, 2022Updated 4 years ago
lucidrains / gated-state-spaces-pytorch
View on GitHub
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Feb 25, 2023Updated 3 years ago
b04901014 / MQTTS
View on GitHub
☆260May 15, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
LEEYOONHYUNG / GraphTTS
View on GitHub
☆12Jul 6, 2023Updated 3 years ago
yangdongchao / AcademiCodec
View on GitHub
AcademiCodec: An Open Source Audio Codec Model for Academic Research
☆674Dec 27, 2023Updated 2 years ago
revsic / torch-nansypp
View on GitHub
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
☆152Feb 11, 2023Updated 3 years ago
lucidrains / nim-tokenizer
View on GitHub
Implementation of a simple BPE tokenizer, but in Nim
☆22Jul 2, 2023Updated 3 years ago
justinlovelace / SESD
View on GitHub
☆61Oct 28, 2024Updated last year
lucidrains / esbn-transformer
View on GitHub
An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols
☆16Aug 3, 2021Updated 4 years ago