jnwnlee / video-foleyLinks
Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 2025.
☆15Updated 2 weeks ago
Alternatives and similar repositories for video-foley
Users that are interested in video-foley are comparing it to the libraries listed below
Sorting:
- ☆38Updated 6 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆29Updated 7 months ago
- ☆23Updated last month
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆21Updated 4 months ago
- ☆43Updated 9 months ago
- Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"☆28Updated last month
- [ACM MM24] Official implementation of paper "From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning"☆29Updated 5 months ago
- Explaining audio differences using language☆15Updated 8 months ago
- ☆10Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆29Updated 9 months ago
- Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset☆15Updated 6 months ago
- official code for CVPR'24 paper Diff-BGM☆69Updated last year
- ☆17Updated 5 months ago
- ☆18Updated last year
- TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking☆19Updated 6 months ago
- ☆41Updated 6 months ago
- ☆35Updated 4 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆71Updated last year
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆24Updated 6 months ago
- ☆14Updated 11 months ago
- Pytorch implementation for “V2C: Visual Voice Cloning”☆31Updated 2 years ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆40Updated last year
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆52Updated 6 months ago
- This repository collects papers related to Speech Tokenizer.☆17Updated last year
- Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…☆18Updated 10 months ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆27Updated last month
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆49Updated last week
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆55Updated 11 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆91Updated 11 months ago
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Updated 2 years ago