Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
☆106Mar 3, 2026Updated 2 weeks ago
Alternatives and similar repositories for audio-intelligence
Users that are interested in audio-intelligence are comparing it to the libraries listed below
Sorting:
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆183Feb 28, 2026Updated 2 weeks ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 10 months ago
- ☆18May 4, 2025Updated 10 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 4 months ago
- ☆28Jul 7, 2025Updated 8 months ago
- ☆117Feb 26, 2026Updated 3 weeks ago
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,019Dec 15, 2025Updated 3 months ago
- Polyphonic generalisation of DDSP☆22Apr 30, 2024Updated last year
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆156Mar 24, 2025Updated 11 months ago
- Code for the paper "Toward Fully Self-Supervised Multi-Pitch Estimation".☆23Sep 27, 2025Updated 5 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆153May 30, 2025Updated 9 months ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 9 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 5 months ago
- Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]☆58Nov 10, 2025Updated 4 months ago
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆172May 20, 2025Updated 10 months ago
- Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"☆40May 5, 2024Updated last year
- A piano music dataset with Audio, Symbolic and Text labels☆34Mar 6, 2025Updated last year
- ☆18Jan 20, 2025Updated last year
- The open source code for LLM-Codec☆145Aug 18, 2024Updated last year
- Unified automatic quality assessment for speech, music, and sound.☆694Jun 5, 2025Updated 9 months ago
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆422Feb 12, 2026Updated last month
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆190May 29, 2024Updated last year
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated 11 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- ☆36Sep 6, 2025Updated 6 months ago
- Copyright-free Artificial Lyrics Dataset (ISMIR 2024 LBD)☆12Sep 1, 2024Updated last year
- ☆25Jun 19, 2025Updated 9 months ago
- Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.☆140Aug 13, 2025Updated 7 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆52Updated this week
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- Official implementation of WildFX Dataset Generating pipeline.☆15Oct 21, 2025Updated 5 months ago
- MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]☆46Jan 23, 2025Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆197Dec 13, 2024Updated last year
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆96Oct 9, 2025Updated 5 months ago
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆60Sep 17, 2024Updated last year