Xiaohao-Liu / Awesome-Vison2Audio
A curated list of Video to Audio Generation
☆33Updated 5 months ago
Alternatives and similar repositories for Awesome-Vison2Audio:
Users that are interested in Awesome-Vison2Audio are comparing it to the libraries listed below
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆94Updated 5 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆32Updated last week
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆40Updated 2 weeks ago
- ☆54Updated 8 months ago
- official code for CVPR'24 paper Diff-BGM☆59Updated 5 months ago
- ☆63Updated 6 months ago
- ☆210Updated last week
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆31Updated 4 months ago
- ☆57Updated 3 weeks ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆160Updated 10 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆23Updated 3 weeks ago
- [ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation☆71Updated last year
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆169Updated 2 months ago
- ☆54Updated this week
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆42Updated 7 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆166Updated last month
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆33Updated last week
- ☆46Updated 2 months ago
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆129Updated last month
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆12Updated last month
- Audio-FLAN☆140Updated 3 weeks ago
- ☆38Updated 7 months ago
- The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“☆20Updated 2 weeks ago
- The open source code for LLM-Codec☆132Updated 7 months ago
- trying to reproduce suno v3☆32Updated 2 months ago
- Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models☆31Updated 3 weeks ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆163Updated 4 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆186Updated 2 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆52Updated 4 months ago
- Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models☆179Updated 10 months ago