ChanganVR / action2sound
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
β18Updated 5 months ago
Alternatives and similar repositories for action2sound:
Users that are interested in action2sound are comparing it to the libraries listed below
- [Official Implementation] Acoustic Autoregressive Modeling π₯β64Updated 6 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)β19Updated 2 months ago
- π¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)β44Updated last month
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β78Updated 2 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ92Updated 4 months ago
- Codebase and project page for EDMSoundβ34Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image β¦β80Updated 8 months ago
- Pytorch implementation for βV2C: Visual Voice Cloningββ30Updated 2 years ago
- Long-Term Rhythmic Video Soundtracker, ICML2023β56Updated 8 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)β50Updated last month
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)β29Updated 4 months ago
- The official implementation of the IJCAI 2024 paper "MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models".β39Updated 6 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.β22Updated 6 months ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)β51Updated 10 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.β46Updated 6 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generationβ32Updated last week
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".β84Updated last year
- β16Updated last year
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"β34Updated this week
- official code for CVPR'24 paper Diff-BGMβ57Updated 5 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ51Updated 4 months ago
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generatiβ¦β178Updated 11 months ago
- β39Updated 2 years ago
- Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Modelsβ177Updated 9 months ago
- β52Updated 8 months ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"β30Updated last year