to release the source code for reproducing the results reported in our paper: https://arxiv.org/abs/2409.17550
☆14Nov 15, 2024Updated last year
Alternatives and similar repositories for SVG_baseline
Users that are interested in SVG_baseline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Apr 12, 2024Updated last year
- Audio-Visual Room Impulse Response Estimation☆24Jul 22, 2024Updated last year
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆60Mar 15, 2026Updated last week
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆127Feb 13, 2025Updated last year
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆14Apr 7, 2025Updated 11 months ago
- ☆33May 13, 2021Updated 4 years ago
- Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"☆44Dec 13, 2024Updated last year
- Python version of PEAQ(Perceptual Evaluation of Audio Quality)☆14Jul 24, 2025Updated 8 months ago
- Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models☆22Apr 15, 2024Updated last year
- [arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models☆36Aug 26, 2025Updated 6 months ago
- Code for paper "PoseEmbroider:Towards a 3D, Visual, Semantic-aware Human Pose Representation" (ECCV 2024)☆18Nov 18, 2024Updated last year
- Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 20…☆17Feb 27, 2026Updated 3 weeks ago
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Dec 8, 2023Updated 2 years ago
- ☆18Feb 5, 2026Updated last month
- ☆37Jun 22, 2022Updated 3 years ago
- Tacotron 2 - PyTorch implementation with faster-than-realtime inference☆30May 28, 2020Updated 5 years ago
- Code for TIP2026 paper: CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation☆78Feb 6, 2026Updated last month
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆60Jul 2, 2025Updated 8 months ago
- K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models☆38Dec 30, 2025Updated 2 months ago
- Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark☆61Aug 29, 2024Updated last year
- Official repository supporting the L3DAS23 IEEE ICASSP Grand Challenge☆16Feb 10, 2023Updated 3 years ago
- ☆14Dec 20, 2021Updated 4 years ago
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆29Feb 12, 2026Updated last month
- ☆15May 13, 2024Updated last year
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis☆12Oct 3, 2024Updated last year
- [ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.☆357Mar 15, 2026Updated last week
- The repository provides code for EgoMAN model and dataset creation scripts.☆28Dec 31, 2025Updated 2 months ago
- Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence☆19Jun 14, 2024Updated last year
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆34Apr 1, 2025Updated 11 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Jan 27, 2025Updated last year
- Transforming Text into Dynamic 2D Characters with Openpose Generation☆16Jul 11, 2024Updated last year
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆25Oct 1, 2024Updated last year
- CVPR 2023: PAniC-3D, rendering☆16Mar 25, 2023Updated 2 years ago
- Model for CDX23 (Cinematic Sound Demixing) contest☆51Jun 24, 2024Updated last year
- ☆10Mar 8, 2025Updated last year
- The dataset CoLan-150K and the concept decomposition in the paper Concept Lancet (CVPR 2025)☆20Jan 18, 2026Updated 2 months ago
- Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors☆31Jun 2, 2024Updated last year