RoySheffer/im2wav

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RoySheffer/im2wav)

RoySheffer / im2wav

Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation

☆125

Alternatives and similar repositories for im2wav

Users that are interested in im2wav are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gallilmaimon / DISSC
View on GitHub
Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730
☆130Dec 8, 2023Updated 2 years ago
shahariel / TEAL
View on GitHub
TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning
☆18Jan 21, 2025Updated last year
guyyariv / AudioToken
View on GitHub
[InterSpeech 2023] The official PyTorch implementation of: "AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Imag…
☆89May 18, 2026Updated 2 months ago
slp-rl / SC-PhASE
View on GitHub
This repo contains the official PyTorch implementation of "A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement" (…
☆28Aug 8, 2022Updated 3 years ago
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
slp-rl / salmon
View on GitHub
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
☆50Aug 15, 2025Updated 11 months ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆205May 29, 2024Updated 2 years ago
MoSalama98 / DSiRe
View on GitHub
Official implementation of "Dataset Size Recovery from LoRA Weights" paper.
☆34Jun 30, 2024Updated 2 years ago
v-iashin / SpecVQGAN
View on GitHub
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
☆372Jul 12, 2024Updated 2 years ago
guyyariv / TempoTokens
View on GitHub
[AAAI 2024] The official PyTorch implementation of "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation"
☆130May 18, 2026Updated 2 months ago
Hadar933 / AdaptiveSpectrumLayer
View on GitHub
Official PyTorch Implementation for the "A Deep Inverse-Mapping Model for a Flapping Robotic Wing" Paper (ICLR 2025)
☆22Dec 16, 2025Updated 7 months ago
eliahuhorwitz / MoTHer
View on GitHub
Official PyTorch Implementation for the "Unsupervised Model Tree Heritage Recovery" paper (ICLR 2025).
☆62Jul 1, 2025Updated last year
AsafShul / PoDD
View on GitHub
Official PyTorch Implementation for the "Distilling Datasets Into Less Than One Image" paper.
☆39Jun 6, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jonkahana / ProbeGen
View on GitHub
An official implementation of ProbeGen
☆13Oct 20, 2024Updated last year
slp-rl / SLM-Discrete-Representations
View on GitHub
This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…
☆20Jan 3, 2023Updated 3 years ago
guyyariv / LaMI
View on GitHub
[ACL 2026 Oral] Official implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion
☆19Jul 4, 2026Updated 2 weeks ago
slp-rl / aero
View on GitHub
This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
☆244May 1, 2025Updated last year
ChenPaulYu / beats-with-you
View on GitHub
🎵 Partnership with AI to create Beats
☆11Oct 13, 2020Updated 5 years ago
heng-hw / V2A-Mapper
View on GitHub
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆29Dec 14, 2023Updated 2 years ago
maormizrachi / MadVoro
View on GitHub
☆20Jul 15, 2026Updated last week
shlizee / Audeo
View on GitHub
☆31Feb 4, 2021Updated 5 years ago
avishaiElmakies / unsupervised_speech_segmentation_using_slm
View on GitHub
☆20Jan 8, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenGVLab / LORIS
View on GitHub
[ICML2023] Long-Term Rhythmic Video Soundtracker
☆63Jul 28, 2025Updated 11 months ago
slp-rl / StressTest
View on GitHub
The official repo of the paper "StressTest: Can YOUR Speech LM Handle the Stress?"
☆20Jun 28, 2026Updated 3 weeks ago
apple-yinhan / TQ-SED
View on GitHub
☆24Mar 19, 2025Updated last year
PeihaoChen / regnet
View on GitHub
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned S…
☆53Dec 15, 2020Updated 5 years ago
naver-ai / rewas
View on GitHub
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆44Dec 13, 2024Updated last year
karchkha / MelSpec_GPT_VQVAE
View on GitHub
Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms
☆18Oct 8, 2023Updated 2 years ago
sukun1045 / video-physics-sound-diffusion
View on GitHub
☆49Jul 10, 2024Updated 2 years ago
slp-rl / SpokenStoryCloze
View on GitHub
A spoken version of the textual story cloze benchmark
☆22Aug 6, 2023Updated 2 years ago
jonkahana / CLIPPR
View on GitHub
An official PyTorch implementation for CLIPPR
☆31Jul 22, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
eliahuhorwitz / Conffusion
View on GitHub
Official Implementation for the "Conffusion: Confidence Intervals for Diffusion Models" paper.
☆144Nov 27, 2022Updated 3 years ago
haoheliu / audioldm_eval
View on GitHub
This toolbox aims to unify audio generation model evaluation for easier comparison.
☆390Sep 29, 2024Updated last year
shansongliu / HumTrans
View on GitHub
☆13Sep 26, 2023Updated 2 years ago
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
eliahuhorwitz / Spectral-DeTuning
View on GitHub
Official PyTorch Implementation for the "Recovering the Pre-Fine-Tuning Weights of Generative Models" paper (ICML 2024).
☆86Apr 15, 2025Updated last year
barcavia / RealTime-DeepfakeDetection-in-the-RealWorld
View on GitHub
Real-Time Deepfake Detection in the Real-World
☆50Nov 30, 2024Updated last year
Isaaclabe / DGD-Dynamic-3D-Gaussians-Distillation
View on GitHub
Official implementation of "DGD: Dynamic 3D Gaussians Distillation".
☆69Aug 16, 2024Updated last year