yuhanghe01/RiTTA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuhanghe01/RiTTA)

yuhanghe01 / RiTTA

Event Relation in Text-to-Audio (TTA) Generation

☆21

Alternatives and similar repositories for RiTTA

Users that are interested in RiTTA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

koudounasalkis / voc2vec
View on GitHub
This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.
☆58Apr 14, 2025Updated last year
sony / soundctm
View on GitHub
Pytorch implementation of SoundCTM
☆101Mar 31, 2025Updated last year
Audio-Foundation-Models / ConversationTTS
View on GitHub
☆101Jan 19, 2026Updated 6 months ago
multitel-ai / urban-sound-tagging
View on GitHub
1st place solution to the DCASE 2020 - Task 5 - Urban Sound Tagging with Spatiotemporal Context
☆17Dec 8, 2022Updated 3 years ago
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
streichgeorg / autosing
View on GitHub
☆18Jan 20, 2025Updated last year
YoonjinXD / T-FOLEY
View on GitHub
Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, ac…
☆34May 25, 2024Updated 2 years ago
ajd12342 / paraspeechclap
View on GitHub
Codebase for 'ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining'
☆23Jun 20, 2026Updated last month
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
FantSun / Speechflow
View on GitHub
Speechflow for emotion recognition related information decomposition
☆10Jul 27, 2021Updated 5 years ago
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
RBenita / DIFFAR
View on GitHub
Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
☆32Mar 8, 2024Updated 2 years ago
naver-ai / usdm
View on GitHub
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
☆95Dec 3, 2024Updated last year
SparkAudio / SparkVox
View on GitHub
☆37Jun 9, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
gladiaio / normalization
View on GitHub
A lightweight library for normalizing speech transcripts before computing WER
☆28Jul 14, 2026Updated 2 weeks ago
qiuk2 / AAR
View on GitHub
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆74Aug 24, 2024Updated last year
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
CompVis / maskflow
View on GitHub
MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation
☆28Mar 4, 2025Updated last year
line / promptttspp
View on GitHub
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
☆86Oct 11, 2024Updated last year
DigitalPhonetics / cyclegan-emotion-transfer
View on GitHub
CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
☆12Oct 7, 2019Updated 6 years ago
yzyouzhang / Audio_Research_in_US
View on GitHub
Audio Research in US. US-based professors who work on audio (music, speech, acoustics). For students who would like to apply for RA, PhD,…
☆27Feb 27, 2026Updated 5 months ago
michen00 / unified_multilingual_dataset_of_emotional_human_utterances
View on GitHub
A unified dataset of multilingual emotional human utterances
☆31Jan 16, 2026Updated 6 months ago
xiquan-li / FineLAP
View on GitHub
[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training
☆36Apr 20, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LiChaiUSTC / CSL-L2M
View on GitHub
☆18May 4, 2025Updated last year
declare-lab / HyperTTS
View on GitHub
☆40Apr 15, 2024Updated 2 years ago
SonyCSLParis / cae-invar
View on GitHub
Learning Complex Basis Functions for Invariant Signal Representations with the Complex Autoencoder
☆38Dec 16, 2024Updated last year
zxzhao0 / C2SER
View on GitHub
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…
☆49Mar 3, 2025Updated last year
ag027592 / EMO-SUPERB
View on GitHub
EMO-SUPERB: a reproducible speech emotion recognition benchmark with leakage-free splits for 6 datasets and 15 speech SSL models (IEEE SL…
☆51Updated this week
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
google-deepmind / librispeech-long
View on GitHub
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …
☆99Dec 28, 2024Updated last year
AGENDD / RWKV-SpeechChat
View on GitHub
RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained …
☆29Jan 1, 2025Updated last year
lzhangbj / ASVA
View on GitHub
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆60Mar 15, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yangdongchao / LLM-Codec
View on GitHub
The open source code for LLM-Codec
☆147Aug 18, 2024Updated last year
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
fundwotsai2001 / Text-to-music-dataset-preparation
View on GitHub
A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]
☆28May 20, 2025Updated last year
XiangLi2022 / CM-TTS
View on GitHub
[Findings of NAACL 2024] Source code of paper CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers a…
☆68Mar 31, 2024Updated 2 years ago
seungheondoh / speech-to-music
View on GitHub
Textless Speech-to-Music Retrieval Using Emotion Similarity [ICASSP23]
☆17Aug 16, 2023Updated 2 years ago
NVIDIA / audio-intelligence
View on GitHub
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with syntheti…
☆137Mar 3, 2026Updated 4 months ago
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year