This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
☆88Jun 18, 2024Updated last year
Alternatives and similar repositories for AudioToken
Users that are interested in AudioToken are comparing it to the libraries listed below
Sorting:
- ☆40Apr 14, 2025Updated 10 months ago
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆129Feb 13, 2025Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 6 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Jul 1, 2024Updated last year
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- ☆69Jul 29, 2023Updated 2 years ago
- BigVGAN with Neural Source-Filter☆56Sep 21, 2023Updated 2 years ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- ☆42Nov 8, 2024Updated last year
- An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"☆137Aug 17, 2023Updated 2 years ago
- ☆10Sep 19, 2022Updated 3 years ago
- Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation☆125Jan 18, 2023Updated 3 years ago
- Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Mo…☆20Sep 1, 2023Updated 2 years ago
- Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)☆79Aug 14, 2023Updated 2 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- ☆26Jun 5, 2024Updated last year
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…☆64May 19, 2023Updated 2 years ago
- HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement☆159Jul 16, 2022Updated 3 years ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- ☆46Apr 16, 2023Updated 2 years ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Jun 2, 2023Updated 2 years ago
- ☆61Nov 4, 2023Updated 2 years ago
- An ODE-based generative neural vocoder using Rectified Flow☆58Apr 29, 2023Updated 2 years ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆498Mar 4, 2025Updated 11 months ago
- The deme page of InstructTTS☆157Feb 10, 2024Updated 2 years ago
- Keep track of big models in audio domain, including speech, singing, music etc.☆506Sep 26, 2024Updated last year
- A toolkit for any-to-any encoder-decoder voice conversion systems☆84Aug 10, 2023Updated 2 years ago
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆155Jul 6, 2024Updated last year
- TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion☆148Jan 15, 2024Updated 2 years ago
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment☆68Jul 5, 2024Updated last year
- ICASSP 2023 Accepted☆190May 6, 2024Updated last year
- Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)☆154Feb 1, 2023Updated 3 years ago
- ☆43Jan 13, 2025Updated last year
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …☆57Aug 9, 2025Updated 6 months ago
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆71Dec 2, 2022Updated 3 years ago
- The official repo of the paper "StressTest: Can YOUR Speech LM Handle the Stress?"☆20Jul 9, 2025Updated 7 months ago