Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
☆129Mar 3, 2026Updated 3 months ago
Alternatives and similar repositories for audio-intelligence
Users that are interested in audio-intelligence are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event Relation in Text-to-Audio (TTA) Generation☆21Feb 26, 2025Updated last year
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆193Feb 28, 2026Updated 4 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆28May 20, 2025Updated last year
- ☆18May 4, 2025Updated last year
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆66Nov 5, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆30Jul 7, 2025Updated 11 months ago
- ☆126Jun 22, 2026Updated last week
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 9 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,146Dec 15, 2025Updated 6 months ago
- Polyphonic generalisation of DDSP☆22Apr 30, 2024Updated 2 years ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆162Mar 26, 2026Updated 3 months ago
- Code for the paper "Toward Fully Self-Supervised Multi-Pitch Estimation".☆25Sep 27, 2025Updated 9 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated last year
- Audio-FLAN☆161Sep 23, 2025Updated 9 months ago
- Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]☆57Nov 10, 2025Updated 7 months ago
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆181May 20, 2025Updated last year
- Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"☆42May 5, 2024Updated 2 years ago
- A piano music dataset with Audio, Symbolic and Text labels☆34Mar 6, 2025Updated last year
- ☆18Jan 20, 2025Updated last year
- The open source code for LLM-Codec☆147Aug 18, 2024Updated last year
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆436Feb 12, 2026Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Unified automatic quality assessment for speech, music, and sound.☆733Jun 5, 2025Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆198May 29, 2024Updated 2 years ago
- Official implemtation of UniverSR (ICASSP 2026)☆55Apr 9, 2026Updated 2 months ago
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆44Jun 13, 2024Updated 2 years ago
- ☆36Sep 6, 2025Updated 9 months ago
- Copyright-free Artificial Lyrics Dataset (ISMIR 2024 LBD)☆12Sep 1, 2024Updated last year
- ☆25Jun 19, 2025Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆58Jun 17, 2026Updated 2 weeks ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆243Dec 18, 2025Updated 6 months ago
- Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.☆143Aug 13, 2025Updated 10 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]☆49Jan 23, 2025Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆207Dec 13, 2024Updated last year
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆62Sep 17, 2024Updated last year