wzk1015 / Awesome-Vision-to-Music-Generation
A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
☆15Updated this week
Alternatives and similar repositories for Awesome-Vision-to-Music-Generation:
Users that are interested in Awesome-Vision-to-Music-Generation are comparing it to the libraries listed below
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆169Updated 2 months ago
- Make Your Training Flexible: Towards Deployment-Efficient Video Models☆17Updated 2 weeks ago
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆198Updated 6 months ago
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆55Updated 6 months ago
- Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation: A framework for generating multimodal music by bridging dif…☆24Updated 2 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆40Updated 2 weeks ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆97Updated last week
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆66Updated 6 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated 2 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆66Updated 5 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆69Updated 5 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆57Updated 2 weeks ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆37Updated this week
- ☆70Updated 3 weeks ago
- The official implementation of the IJCAI 2024 paper "MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models".☆41Updated 6 months ago
- ☆210Updated 2 weeks ago
- ☆18Updated 3 weeks ago
- Implementation of the proposed MaskBit from Bytedance AI☆75Updated 4 months ago
- XMIDI Dataset: A large-scale symbolic music dataset with emotion and genre labels.☆18Updated 2 months ago
- small audio language model for reasoning☆50Updated last week
- Long-Term Rhythmic Video Soundtracker, ICML2023☆57Updated 8 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆28Updated last week
- ☆38Updated 7 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆23Updated 3 weeks ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆30Updated 3 months ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated 6 months ago
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆51Updated last month
- This repository aims to collect Transformer-based sound event detection (SED) algorithms.☆54Updated 2 months ago
- ☆39Updated last year
- Official PyTorch implementation of TokenSet.☆104Updated last week