IFICL / images-that-sound
Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusions
☆205Updated 3 months ago
Related projects: ⓘ
- Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.☆140Updated last month
- ☆130Updated last week
- AI Prediction api of the MusicLang package☆262Updated 5 months ago
- Fine-tune Stable Audio Open with DiT ControlNet.☆155Updated 2 weeks ago
- The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.☆131Updated 8 months ago
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆141Updated 5 months ago
- ☆129Updated last week
- Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stabili…☆111Updated last month
- We present a model that can generate accurate 3D sound fields of human bodies from headset microphones and body pose as inputs.☆82Updated 3 months ago
- Code release for https://kovenyu.com/WonderWorld/☆15Updated last month
- ☆106Updated 11 months ago
- ☆74Updated 8 months ago
- Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound☆79Updated 2 months ago
- Official Implementation of weights2weights☆98Updated last week
- Fine-tune your own MusicGen with LoRA☆95Updated 4 months ago
- Code for Toon3D https://toon3d.studio/☆190Updated 3 months ago
- Official repo of 𝙄𝙣𝙩𝙧𝙞𝙣𝙨𝙞𝙘 𝙇𝙤𝙍𝘼: 𝘼 𝙂𝙚𝙣𝙚𝙧𝙖𝙡𝙞𝙨𝙩 𝘼𝙥𝙥𝙧𝙤𝙖𝙘𝙝 𝙛𝙤𝙧 𝘿𝙞𝙨𝙘𝙤𝙫𝙚𝙧𝙞𝙣𝙜 𝙆𝙣𝙤𝙬𝙡𝙚𝙙𝙜𝙚 �…☆175Updated 2 months ago
- Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model☆140Updated last month
- This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.☆323Updated last year
- Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models☆148Updated 3 months ago
- ☆65Updated this week
- A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.☆194Updated last year
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆53Updated 3 months ago
- Mustango: Toward Controllable Text-to-Music Generation☆323Updated last month
- ☆44Updated 3 weeks ago
- Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"☆95Updated 3 weeks ago
- Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, an…☆290Updated 3 months ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆21Updated last week
- RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance☆101Updated 3 months ago
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆69Updated 9 months ago