This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
☆88Jun 18, 2024Updated last year
Alternatives and similar repositories for AudioToken
Users that are interested in AudioToken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆40Apr 14, 2025Updated 11 months ago
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆127Feb 13, 2025Updated last year
- Official PyTorch implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion (ACL 2026)☆17Updated this week
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 7 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation☆125Jan 18, 2023Updated 3 years ago
- ☆43Nov 8, 2024Updated last year
- ☆26Jun 5, 2024Updated last year
- The official repo of the paper "StressTest: Can YOUR Speech LM Handle the Stress?"☆20Jul 9, 2025Updated 9 months ago
- ☆69Jul 29, 2023Updated 2 years ago
- BigVGAN with Neural Source-Filter☆56Sep 21, 2023Updated 2 years ago
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆131Dec 8, 2023Updated 2 years ago
- Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)☆79Aug 14, 2023Updated 2 years ago
- Official implementation of "Describing Sets of Images with Textual-PCA".☆16Feb 13, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- ☆46Apr 16, 2023Updated 2 years ago
- ☆10Sep 19, 2022Updated 3 years ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Official implementation of "Dataset Size Recovery from LoRA Weights" paper.☆34Jun 30, 2024Updated last year
- This repo contains the official PyTorch implementation of "A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement" (…☆28Aug 8, 2022Updated 3 years ago
- An official implementation of ProbeGen☆13Oct 20, 2024Updated last year
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning☆17Jan 21, 2025Updated last year
- ☆19Jan 8, 2025Updated last year
- ☆30Oct 20, 2021Updated 4 years ago
- An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"☆137Aug 17, 2023Updated 2 years ago
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆155Jul 6, 2024Updated last year
- HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement☆160Jul 16, 2022Updated 3 years ago
- This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)☆242May 1, 2025Updated 11 months ago
- An ODE-based generative neural vocoder using Rectified Flow☆58Apr 29, 2023Updated 2 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆34Feb 21, 2025Updated last year
- Keep track of big models in audio domain, including speech, singing, music etc.☆505Sep 26, 2024Updated last year
- ☆19Feb 2, 2023Updated 3 years ago
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment☆68Jul 5, 2024Updated last year
- ☆16Dec 22, 2017Updated 8 years ago
- The deme page of InstructTTS☆157Feb 10, 2024Updated 2 years ago
- We are committing code.☆44May 18, 2023Updated 2 years ago