liuhuadai / ThinkSoundLinks

PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

☆13

Alternatives and similar repositories for ThinkSound

Users that are interested in ThinkSound are comparing it to the libraries listed below

Sorting:

thuhcsi / SnakeGAN
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Updated 2 years ago
yangdongchao / ALMTokenizer2
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆26Updated last month
ftshijt / Interspeech2024_DiscreteSpeechChallenge
This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.
☆32Updated last year
streichgeorg / autosing
☆12Updated 5 months ago
xinshengwang / robpitch
A pitch detection model trained to be robust against noise and reverberation environments.
☆26Updated 5 months ago
Mddct / transformer-vocos
☆28Updated last month
pengzhendong / streaming-vocos
Streaming Vocos
☆27Updated 2 weeks ago
jiaqili3 / DualCodec
A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
☆35Updated 3 weeks ago
yuhanghe01 / RiTTA
Event Relation in Text-to-Audio (TTA) Generation
☆20Updated 4 months ago
b04901014 / vae-gslm
Official Implementation for the paper: A Variational Framework for Improving Naturalness in Generative Spoken Language Models
☆18Updated last week
gwh22 / LAFMA
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆38Updated last year
slp-rl / SpokenStoryCloze
A spoken version of the textual story cloze benchmark
☆17Updated last year
mubtasimahasan / DM-Codec
Source code for DM-Codec.
☆45Updated 3 weeks ago
rishikksh20 / MiniMax-TTS-pytorch
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
☆33Updated last month
Audio-Foundation-Models / ConversationTTS
☆75Updated 3 weeks ago
exercise-book-yq / FreeCodec
FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS
☆21Updated 9 months ago
yangdongchao / ALMTokenizer
The demo page for ALMTokenizer
☆51Updated 2 months ago
Honee-W / FlowSE
Official repository for FlowSE (Interspeech 2025)
☆18Updated 2 weeks ago
JusperLee / Gull-Codec-Training
☆13Updated 3 months ago
lavendery / AudioComposer
☆23Updated 8 months ago
ex3ndr / supervoice-librilight-preprocessed
60k hours of phoneme-aligned audio from audio books
☆18Updated 11 months ago
ZehuaKcrissLi / GTR-Voice
☆13Updated 7 months ago
fundwotsai2001 / Text-to-music-dataset-preparation
A repo that builds text to music datasets from scratch
☆22Updated last month
huutuongtu / Lightvoc
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆17Updated last year
meaningTeam / tidy-tunes
Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …
☆21Updated 3 weeks ago
liuhuang31 / HiFTNet-sr
HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz
☆24Updated last year
shang0712 / HierTTS
☆45Updated 2 years ago
ryota-komatsu / speech_resynth
Speech Resynthesis and Language Modeling
☆19Updated 2 weeks ago
light1726 / SpeechTripleNet
The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"
☆33Updated last year
CODEJIN / XiaoiceSing2
☆19Updated 2 years ago