kaiidams/soundstream-pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kaiidams/soundstream-pytorch)

kaiidams / soundstream-pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

☆80

Alternatives and similar repositories for soundstream-pytorch

Users that are interested in soundstream-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
wesbz / SoundStream
View on GitHub
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
☆429Apr 21, 2022Updated 4 years ago
ex3ndr / supervoice-gpt
View on GitHub
GPT-style network for phonemization with durations of text
☆69Mar 21, 2024Updated 2 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pengzhendong / streaming-vocos
View on GitHub
Streaming Vocos
☆31Jun 10, 2025Updated last year
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated last year
choiHkk / Transformer-TTS-V2
View on GitHub
☆25Mar 6, 2024Updated 2 years ago
Taltt / FNSE-SBGAN
View on GitHub
FNSE-SBGAN: Far-field Speech Enhancement with Schrödinger Bridge and Generative Adversarial Networks
☆19May 12, 2025Updated last year
mileskuo42 / AudioMarkBench
View on GitHub
Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarking
☆47Aug 23, 2024Updated last year
ZhikangNiu / encodec-pytorch
View on GitHub
unofficial implementation of the High Fidelity Neural Audio Compression
☆176Aug 15, 2024Updated last year
7Xin / DPI-TTS
View on GitHub
☆13Sep 12, 2024Updated last year
kaistmm / fregrad
View on GitHub
[ICASSP 2024] Official code for FreGrad
☆35May 13, 2024Updated 2 years ago
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / AudioDec
View on GitHub
An Open-source Streaming High-fidelity Neural Audio Codec
☆509Mar 4, 2025Updated last year
zjlww / ardit-web
View on GitHub
☆27Aug 2, 2024Updated last year
modelscope / FunCodec
View on GitHub
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…
☆444Jan 25, 2024Updated 2 years ago
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆217Sep 19, 2024Updated last year
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
Beilong-Tang / TSELM
View on GitHub
Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models
☆59Apr 14, 2025Updated last year
lars76 / fastspeech2-clean
View on GitHub
Clean and modernized implementation of FastSpeech2/LightSpeech using IPA
☆18Aug 16, 2024Updated last year
lifeiteng / naturalspeech3_facodec
View on GitHub
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
☆251Apr 20, 2024Updated 2 years ago
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆118Jun 4, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WelkinYang / WaveODE
View on GitHub
An ODE-based generative neural vocoder using Rectified Flow
☆58Apr 29, 2023Updated 3 years ago
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
Vyvo-Labs / SpeechPlus
View on GitHub
SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀
☆21May 20, 2025Updated last year
NTIA / alignnet
View on GitHub
Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.
☆18Aug 1, 2025Updated 11 months ago
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
adelacvg / detail_tts
View on GitHub
All generative model in one for better TTS model
☆74Sep 8, 2024Updated last year
yuval-reshef / StreamVC
View on GitHub
An unofficial pytorch implementation of "STREAMVC: REAL-TIME LOW-LATENCY VOICE CONVERSION".
☆83Apr 15, 2025Updated last year
yangdongchao / AcademiCodec
View on GitHub
AcademiCodec: An Open Source Audio Codec Model for Academic Research
☆674Dec 27, 2023Updated 2 years ago
ex3ndr / supervoice-enhance
View on GitHub
Supervoice diffusion enhance
☆28Jul 15, 2024Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
naver-ai / RapFlow-TTS
View on GitHub
☆55Jul 16, 2025Updated 11 months ago
light1726 / SpeechTripleNet
View on GitHub
The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"
☆33Nov 23, 2023Updated 2 years ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
fakerybakery / utmos
View on GitHub
A toolkit to calculate speech audio quality. Not affiliated with the original authors
☆73Aug 13, 2024Updated last year
FENRlR / MB-iSTFT-VITS2
View on GitHub
Application of MB-iSTFT-VITS components to vits2_pytorch
☆135Dec 29, 2025Updated 6 months ago
Plachtaa / FAcodec
View on GitHub
Training code for FAcodec presented in NaturalSpeech3
☆245Aug 26, 2024Updated last year
youngsheen / GPST
View on GitHub
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆70Nov 1, 2024Updated last year