☆75Jan 8, 2024Updated 2 years ago
Alternatives and similar repositories for SonicVisionLM
Users that are interested in SonicVisionLM are comparing it to the libraries listed below
Sorting:
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- [ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation☆78Mar 29, 2024Updated last year
- Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].☆20Sep 19, 2024Updated last year
- ☆11Feb 9, 2024Updated 2 years ago
- Official code release for "TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion", accepted ICIST 2023☆12Mar 17, 2024Updated last year
- ☆13Dec 18, 2023Updated 2 years ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'☆13Jun 16, 2024Updated last year
- ☆11Feb 8, 2024Updated 2 years ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆12Jun 1, 2023Updated 2 years ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆110Sep 15, 2025Updated 5 months ago
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- Video Background Music Generation Using Unpaired Audio-Visual Data☆30Oct 8, 2024Updated last year
- ☆10May 15, 2021Updated 4 years ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆14Apr 7, 2025Updated 10 months ago
- Cross-platform ChatGPTBox, supporting GPT/ DALL-E/Gemini API.☆18Apr 30, 2025Updated 10 months ago
- Bookmarklet to pull and run hugging face GGUF models in Ollama☆17Oct 17, 2024Updated last year
- Fetches transcripts from YouTube videos, including private ones with granted access, and optionally downloads the videos. Does not suppor…☆16Apr 17, 2024Updated last year
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆155Jul 6, 2024Updated last year
- ☆62Jun 15, 2025Updated 8 months ago
- We are committing code.☆44May 18, 2023Updated 2 years ago
- A text-to-audio model for generating text-conditioned drum beats☆19Apr 25, 2023Updated 2 years ago
- AudioLDM text to audio colab☆19Nov 6, 2023Updated 2 years ago
- ☆15Jun 15, 2022Updated 3 years ago
- 📊 Research-focused SDXL training framework exploring novel optimization approaches. Goals include enhanced image quality, training stabi…☆21Jun 7, 2025Updated 8 months ago
- ☆17Jan 10, 2024Updated 2 years ago
- Please visit https://thuhcsi.github.io/SnakeGAN/☆37Apr 25, 2023Updated 2 years ago
- ☆19Sep 4, 2024Updated last year
- ☆15Sep 20, 2023Updated 2 years ago
- LevelPixel Nodes for ComfyUI☆24Feb 24, 2026Updated last week
- ☆19May 19, 2024Updated last year
- ☆40Apr 14, 2025Updated 10 months ago
- ☆19Jan 15, 2024Updated 2 years ago
- Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)☆17Oct 12, 2021Updated 4 years ago
- music denoising network☆16Sep 24, 2024Updated last year
- ☆22Jan 15, 2024Updated 2 years ago
- A web application for playing 20 Questions to crowdsource common sense. 🤖☆16Sep 29, 2022Updated 3 years ago
- ☆45Jun 11, 2024Updated last year
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Sep 11, 2024Updated last year
- This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.☆21Jul 21, 2021Updated 4 years ago