Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
☆126Mar 3, 2026Updated 3 months ago
Alternatives and similar repositories for audio-intelligence
Users that are interested in audio-intelligence are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event Relation in Text-to-Audio (TTA) Generation☆21Feb 26, 2025Updated last year
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆192Feb 28, 2026Updated 3 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated last year
- ☆18May 4, 2025Updated last year
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆65Nov 5, 2025Updated 7 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆30Jul 7, 2025Updated 11 months ago
- ☆121Jun 2, 2026Updated last week
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 9 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,140Dec 15, 2025Updated 5 months ago
- Polyphonic generalisation of DDSP☆22Apr 30, 2024Updated 2 years ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆161Mar 26, 2026Updated 2 months ago
- Code for the paper "Toward Fully Self-Supervised Multi-Pitch Estimation".☆25Sep 27, 2025Updated 8 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated last year
- Audio-FLAN☆161Sep 23, 2025Updated 8 months ago
- Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]☆57Nov 10, 2025Updated 7 months ago
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆178May 20, 2025Updated last year
- Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"☆42May 5, 2024Updated 2 years ago
- A piano music dataset with Audio, Symbolic and Text labels☆34Mar 6, 2025Updated last year
- ☆18Jan 20, 2025Updated last year
- The open source code for LLM-Codec☆147Aug 18, 2024Updated last year
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆435Feb 12, 2026Updated 3 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Unified automatic quality assessment for speech, music, and sound.☆726Jun 5, 2025Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆197May 29, 2024Updated 2 years ago
- Official implemtation of UniverSR (ICASSP 2026)☆50Apr 9, 2026Updated 2 months ago
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆44Jun 13, 2024Updated last year
- ☆36Sep 6, 2025Updated 9 months ago
- Copyright-free Artificial Lyrics Dataset (ISMIR 2024 LBD)☆12Sep 1, 2024Updated last year
- ☆25Jun 19, 2025Updated 11 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆57May 22, 2026Updated 2 weeks ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 5 months ago
- Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.☆143Aug 13, 2025Updated 9 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]☆47Jan 23, 2025Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆204Dec 13, 2024Updated last year
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆61Sep 17, 2024Updated last year