Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
☆116Mar 3, 2026Updated last month
Alternatives and similar repositories for audio-intelligence
Users that are interested in audio-intelligence are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event Relation in Text-to-Audio (TTA) Generation☆21Feb 26, 2025Updated last year
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆188Feb 28, 2026Updated 2 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 11 months ago
- ☆18May 4, 2025Updated 11 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆29Jul 7, 2025Updated 9 months ago
- ☆117Updated this week
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 7 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,110Dec 15, 2025Updated 4 months ago
- Polyphonic generalisation of DDSP☆22Apr 30, 2024Updated 2 years ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆160Mar 26, 2026Updated last month
- Code for the paper "Toward Fully Self-Supervised Multi-Pitch Estimation".☆25Sep 27, 2025Updated 7 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 10 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 7 months ago
- Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]☆57Nov 10, 2025Updated 5 months ago
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆176May 20, 2025Updated 11 months ago
- Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"☆42May 5, 2024Updated last year
- A piano music dataset with Audio, Symbolic and Text labels☆34Mar 6, 2025Updated last year
- ☆18Jan 20, 2025Updated last year
- The open source code for LLM-Codec☆147Aug 18, 2024Updated last year
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆425Feb 12, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Unified automatic quality assessment for speech, music, and sound.☆710Jun 5, 2025Updated 10 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆193May 29, 2024Updated last year
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated last year
- Official implemtation of UniverSR (ICASSP 2026)☆44Apr 9, 2026Updated 3 weeks ago
- Copyright-free Artificial Lyrics Dataset (ISMIR 2024 LBD)☆12Sep 1, 2024Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- ☆36Sep 6, 2025Updated 7 months ago
- ☆25Jun 19, 2025Updated 10 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆57Updated this week
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 4 months ago
- Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.☆142Aug 13, 2025Updated 8 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]☆47Jan 23, 2025Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆201Dec 13, 2024Updated last year
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆60Sep 17, 2024Updated last year