Xiaohao-Liu / Awesome-Vison2Audio
A curated list of Video to Audio Generation
☆37Updated last week
Alternatives and similar repositories for Awesome-Vison2Audio:
Users that are interested in Awesome-Vison2Audio are comparing it to the libraries listed below
- ☆219Updated last month
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆41Updated last month
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆95Updated 6 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆177Updated 3 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆37Updated last week
- ☆64Updated 7 months ago
- official code for CVPR'24 paper Diff-BGM☆62Updated 6 months ago
- ☆24Updated 3 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆162Updated 10 months ago
- ☆46Updated 3 months ago
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆142Updated last month
- small audio language model for reasoning☆58Updated last week
- ☆64Updated 3 weeks ago
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆13Updated 2 months ago
- A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.☆56Updated last week
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆201Updated 7 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆34Updated 3 weeks ago
- Audio-FLAN☆141Updated last month
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆181Updated last month
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆27Updated last month
- ☆62Updated last month
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆205Updated last week
- llama-omni训练代码复现☆60Updated 3 months ago
- ☆55Updated 9 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆37Updated this week
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆33Updated 7 months ago
- [ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation☆73Updated last year
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆42Updated 8 months ago
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆89Updated last week
- This is the official implementation of MusER (AAAI'24).☆27Updated 10 months ago