RoySheffer / im2wav
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
☆102Updated last year
Related projects: ⓘ
- Official Implementation of EnCLAP (ICASSP 2024)☆88Updated 3 months ago
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆122Updated 9 months ago
- Toward Universal Text-to-Music-Retrieval (TTMR) [ICASSP23]☆111Updated last year
- AudioLDM training, finetuning, evaluation and inference.☆191Updated 3 months ago
- Unofficial download repository for MusicCaps☆41Updated last year
- A collection of audio autoencoders, in PyTorch.☆37Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆75Updated 3 months ago
- ☆35Updated last year
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆141Updated 5 months ago
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆69Updated 9 months ago
- This package aims at simplifying the download of the AudioCaps dataset.☆29Updated 9 months ago
- Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models☆148Updated 3 months ago
- VoiceLDM: Text-to-Speech with Environmental Context☆157Updated last month
- Audio Captioning datasets for PyTorch.☆98Updated 2 weeks ago
- A simple library for Fréchet Audio Distance (FAD) calculation☆137Updated last week
- Unsupervised Rhythm Modeling for Voice Conversion☆78Updated last year
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆81Updated last month
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆194Updated last month
- The latent diffusion model for text-to-music generation.☆151Updated 7 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆68Updated 2 months ago
- official code for CVPR'24 paper Diff-BGM☆38Updated 5 months ago
- Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stabili…☆111Updated last month
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆169Updated 3 weeks ago
- Long-Term Rhythmic Video Soundtracker, ICML2023☆54Updated 2 months ago
- A lightweight library for Frechet Audio Distance calculation.☆230Updated 2 weeks ago
- Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)☆338Updated 2 months ago
- A toolbox that provides hackable building blocks for generic 1D/2D/3D UNets, in PyTorch.☆77Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆99Updated 5 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- ☆44Updated 2 months ago