This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
☆88Jun 18, 2024Updated last year
Alternatives and similar repositories for AudioToken
Users that are interested in AudioToken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆40Apr 14, 2025Updated 11 months ago
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆127Feb 13, 2025Updated last year
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Jul 1, 2024Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 6 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation☆124Jan 18, 2023Updated 3 years ago
- ☆43Nov 8, 2024Updated last year
- The official repo of the paper "StressTest: Can YOUR Speech LM Handle the Stress?"☆20Jul 9, 2025Updated 8 months ago
- ☆26Jun 5, 2024Updated last year
- ☆69Jul 29, 2023Updated 2 years ago
- BigVGAN with Neural Source-Filter☆56Sep 21, 2023Updated 2 years ago
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆131Dec 8, 2023Updated 2 years ago
- Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)☆79Aug 14, 2023Updated 2 years ago
- Official implementation of "Describing Sets of Images with Textual-PCA".☆16Feb 13, 2023Updated 3 years ago
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- ☆46Apr 16, 2023Updated 2 years ago
- ☆10Sep 19, 2022Updated 3 years ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Official implementation of "Dataset Size Recovery from LoRA Weights" paper.☆34Jun 30, 2024Updated last year
- This repo contains the official PyTorch implementation of "A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement" (…☆28Aug 8, 2022Updated 3 years ago
- An official implementation of ProbeGen☆13Oct 20, 2024Updated last year
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning☆17Jan 21, 2025Updated last year
- ☆19Jan 8, 2025Updated last year
- ☆30Oct 20, 2021Updated 4 years ago
- An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"☆138Aug 17, 2023Updated 2 years ago
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆155Jul 6, 2024Updated last year
- HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement☆159Jul 16, 2022Updated 3 years ago
- This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)☆239May 1, 2025Updated 10 months ago
- An ODE-based generative neural vocoder using Rectified Flow☆58Apr 29, 2023Updated 2 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆34Feb 21, 2025Updated last year
- Keep track of big models in audio domain, including speech, singing, music etc.☆506Sep 26, 2024Updated last year
- ☆19Feb 2, 2023Updated 3 years ago
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment☆68Jul 5, 2024Updated last year
- ☆16Dec 22, 2017Updated 8 years ago
- The deme page of InstructTTS☆157Feb 10, 2024Updated 2 years ago
- We are committing code.☆44May 18, 2023Updated 2 years ago