guyyariv/AudioToken

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/guyyariv/AudioToken)

guyyariv / AudioToken

[InterSpeech 2023] The official PyTorch implementation of: "AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation"

☆89

Alternatives and similar repositories for AudioToken

Users that are interested in AudioToken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kaist-ami / Sound2Scene
View on GitHub
☆42Apr 14, 2025Updated last year
guyyariv / TempoTokens
View on GitHub
[AAAI 2024] The official PyTorch implementation of "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation"
☆130May 18, 2026Updated 2 months ago
taegyeong-lee / Generating-Realistic-Images-from-In-the-wild-Sounds
View on GitHub
Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023
☆12Aug 24, 2025Updated 11 months ago
guyyariv / LaMI
View on GitHub
[ACL 2026 Oral] Official implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion
☆19Jul 4, 2026Updated 2 weeks ago
RoySheffer / im2wav
View on GitHub
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
☆125Jan 18, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
slp-rl / SLM-Discrete-Representations
View on GitHub
This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…
☆20Jan 3, 2023Updated 3 years ago
BurakCanBiner / SonicDiffusion
View on GitHub
☆43Nov 8, 2024Updated last year
Dapwner / CVAE-Tacotron
View on GitHub
☆26Jun 5, 2024Updated 2 years ago
hs-oh-prml / DiffProsody
View on GitHub
☆69Jul 29, 2023Updated 2 years ago
PlayVoice / BigVGAN
View on GitHub
BigVGAN with Neural Source-Filter
☆58Sep 21, 2023Updated 2 years ago
gallilmaimon / DISSC
View on GitHub
Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730
☆130Dec 8, 2023Updated 2 years ago
kuai-lab / sound-guided-semantic-image-manipulation
View on GitHub
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)
☆80Aug 14, 2023Updated 2 years ago
OdedH / textual-pca
View on GitHub
Official implementation of "Describing Sets of Images with Textual-PCA".
☆16Feb 13, 2023Updated 3 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shang0712 / HierTTS
View on GitHub
☆47Apr 16, 2023Updated 3 years ago
Chung-I / youtube-asr-crawler
View on GitHub
☆10Sep 19, 2022Updated 3 years ago
slp-rl / SC-PhASE
View on GitHub
This repo contains the official PyTorch implementation of "A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement" (…
☆28Aug 8, 2022Updated 3 years ago
Aria-K-Alethia / laughter-synthesis
View on GitHub
Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…
☆77Jul 16, 2023Updated 3 years ago
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
PlayVoice / VI-Speaker
View on GitHub
Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.
☆30Sep 16, 2022Updated 3 years ago
jonkahana / ProbeGen
View on GitHub
An official implementation of ProbeGen
☆13Oct 20, 2024Updated last year
shahariel / TEAL
View on GitHub
TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning
☆18Jan 21, 2025Updated last year
MoSalama98 / DSiRe
View on GitHub
Official implementation of "Dataset Size Recovery from LoRA Weights" paper.
☆34Jun 30, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
avishaiElmakies / unsupervised_speech_segmentation_using_slm
View on GitHub
☆20Jan 8, 2025Updated last year
idansc / fga
View on GitHub
☆30Oct 20, 2021Updated 4 years ago
gmltmd789 / UnitSpeech
View on GitHub
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
☆137Aug 17, 2023Updated 2 years ago
yzxing87 / Seeing-and-Hearing
View on GitHub
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆155Jul 6, 2024Updated 2 years ago
slp-rl / aero
View on GitHub
This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
☆244May 1, 2025Updated last year
rishikksh20 / HiFiplusplus-pytorch
View on GitHub
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement
☆160Jul 16, 2022Updated 4 years ago
WelkinYang / WaveODE
View on GitHub
An ODE-based generative neural vocoder using Rectified Flow
☆58Apr 29, 2023Updated 3 years ago
swimmiing / ACL-SSL
View on GitHub
Repository of the IJCV'26 & WACV'24 paper
☆34Apr 27, 2026Updated 2 months ago
liusongxiang / Large-Audio-Models
View on GitHub
Keep track of big models in audio domain, including speech, singing, music etc.
☆515Jul 3, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
idansc / HighOrderAtten
View on GitHub
☆16Dec 22, 2017Updated 8 years ago
CODEJIN / XiaoiceSing2
View on GitHub
☆19Feb 2, 2023Updated 3 years ago
RickyL-2000 / AlignSTS
View on GitHub
Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment
☆68Jul 5, 2024Updated 2 years ago
vectominist / spin
View on GitHub
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…
☆65May 19, 2023Updated 3 years ago
yangdongchao / InstructTTS
View on GitHub
The deme page of InstructTTS
☆158Feb 10, 2024Updated 2 years ago
kuai-lab / soundini-official
View on GitHub
We are committing code.
☆44May 18, 2023Updated 3 years ago