NVIDIA / elucidated-text-to-audioLinks

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

☆55

Alternatives and similar repositories for elucidated-text-to-audio

Users that are interested in elucidated-text-to-audio are comparing it to the libraries listed below

Sorting:

yangdongchao / UniAudio2
The open-source code of UniAudio2.0
☆73Updated 3 weeks ago
haoheliu / SemantiCodec
☆44Updated last year
yangdongchao / ALMTokenizer2
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆42Updated 3 weeks ago
thuhcsi / SnakeGAN
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Updated 2 years ago
slSeanWU / beats-conformer-bart-audio-captioner
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆38Updated last year
gwh22 / LAFMA
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆39Updated last year
ftshijt / Interspeech2024_DiscreteSpeechChallenge
This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.
☆32Updated last year
fundwotsai2001 / Text-to-music-dataset-preparation
A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]
☆25Updated 4 months ago
yuhanghe01 / RiTTA
Event Relation in Text-to-Audio (TTA) Generation
☆20Updated 7 months ago
Audio-Foundation-Models / ConversationTTS
☆79Updated 2 months ago
AlanBaade / SyllableLM
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆58Updated 3 months ago
MWM-io / nansypp
Unofficial implementation of NANSY++ in Pytorch Lightning
☆50Updated last year
xiquan-li / MeanAudio
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆95Updated last month
hhguo / SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆81Updated 9 months ago
asappresearch / simple-tts
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
☆54Updated last year
SonyResearch / VRVQ
Variable Bitrate Residual Vector Quantization for Audio Coding
☆49Updated 5 months ago
XiaoyuBIE1994 / SDCodec
(ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec
☆38Updated 4 months ago
AmphionTeam / TaDiCodec
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆32Updated last week
lesterphillip / serenade
A Singing Style Conversion Framework Based On Audio Infilling
☆26Updated 5 months ago
nonverbalspeech38k / nonverspeech38k
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆48Updated last week
mubtasimahasan / DM-Codec
Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”
☆53Updated 4 months ago
yangdongchao / ALMTokenizer
The demo page for ALMTokenizer
☆53Updated 5 months ago
qiuqiangkong / audio_flow
☆106Updated last month
innnky / descript-audio-vae
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
☆81Updated last year
DiFlow-TTS / DiFlow-TTS
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
☆52Updated last week
gzhu06 / Cacophony
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆48Updated 11 months ago
gyt1145028706 / XY-Tokenizer
This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs. Demos, technical insigh…
☆74Updated last week
zengchang233 / xiaoicesing2
The source code for the paper XiaoiceSing2 (interspeech2023)
☆47Updated last year
york135 / MIRMLPop
The MIR-MLPop dataset and the official implementation of the paper "MIR-MLPop: A Multilingual Pop Music Dataset with Time-Aligned Lyrics …
☆29Updated last year
zhai-lw / SQCodec
A lightweight audio codec based on a single quantizer
☆66Updated last month