SonyResearch/SVG_baseline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SonyResearch/SVG_baseline)

SonyResearch / SVG_baseline

to release the source code for reproducing the results reported in our paper: https://arxiv.org/abs/2409.17550

☆14

Alternatives and similar repositories for SVG_baseline

Users that are interested in SVG_baseline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mzsun01 / MM-LDM
View on GitHub
☆11Apr 12, 2024Updated 2 years ago
lzhangbj / ASVA
View on GitHub
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆60Mar 15, 2026Updated 4 months ago
guyyariv / TempoTokens
View on GitHub
[AAAI 2024] The official PyTorch implementation of "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation"
☆130May 18, 2026Updated 2 months ago
OpenNLPLab / TAVGBench
View on GitHub
Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation
☆15Apr 7, 2025Updated last year
anton-jeran / AV-RIR
View on GitHub
Audio-Visual Room Impulse Response Estimation
☆24Jul 22, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
MarcoPenne / AMR_MPC-CBF
View on GitHub
☆34May 13, 2021Updated 5 years ago
naver-ai / rewas
View on GitHub
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆44Dec 13, 2024Updated last year
lsg1213 / PEAQ_python
View on GitHub
Python version of PEAQ(Perceptual Evaluation of Audio Quality)
☆14Jul 24, 2025Updated last year
dkurzend / ClipClap-GZSL
View on GitHub
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
☆23Apr 15, 2024Updated 2 years ago
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
l3das / L3DAS21
View on GitHub
☆37Jun 22, 2022Updated 4 years ago
amazon-science / avgen-eval-toolkit
View on GitHub
☆19Feb 5, 2026Updated 5 months ago
Yeongtae / tacotron2
View on GitHub
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
☆30May 28, 2020Updated 6 years ago
PeiwenSun2000 / Both-Ears-Wide-Open
View on GitHub
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆65Jul 2, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / real-acoustic-fields
View on GitHub
Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark
☆64Aug 29, 2024Updated last year
AdobeDocs / cc-libraries-api-samples
View on GitHub
☆14Dec 20, 2021Updated 4 years ago
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆21Jun 14, 2024Updated 2 years ago
l3das / L3DAS23
View on GitHub
Official repository supporting the L3DAS23 IEEE ICASSP Grand Challenge
☆16Feb 10, 2023Updated 3 years ago
IamCreateAI / CycleVAR
View on GitHub
[ICCV 2025] CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation
☆18Jul 7, 2025Updated last year
jnwnlee / video-foley
View on GitHub
Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 20…
☆19Feb 27, 2026Updated 4 months ago
alfredplpl / imagen-mini-girl
View on GitHub
Imagen-mini for girl image generation
☆12Nov 19, 2022Updated 3 years ago
Surrey-UP-Lab / AV-GS
View on GitHub
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis
☆14Oct 3, 2024Updated last year
ChanganVR / action2sound
View on GitHub
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆26Oct 1, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
FreedomIntelligence / MTalk-Bench
View on GitHub
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols
☆20Nov 19, 2025Updated 8 months ago
ZachNagengast / LAION-Dalle-Scraper
View on GitHub
Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel
☆11Oct 10, 2023Updated 2 years ago
ZFTurbo / MVSEP-CDX23-Cinematic-Sound-Demixing
View on GitHub
Model for CDX23 (Cinematic Sound Demixing) contest
☆57Jun 24, 2024Updated 2 years ago
kyegomez / Mirasol
View on GitHub
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Jan 27, 2025Updated last year
Tr1stesse / DirectEdit
View on GitHub
[ICML 2026] Official implementation for "DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing".
☆28May 5, 2026Updated 2 months ago
ShuhongChen / vroid_renderer
View on GitHub
CVPR 2023: PAniC-3D, rendering
☆16Mar 25, 2023Updated 3 years ago
TsinghuaC3I / FS-GEN
View on GitHub
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.
☆13Nov 19, 2024Updated last year
RyannDaGreat / peekaboo
View on GitHub
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
☆31Jun 2, 2024Updated 2 years ago
sarulab-speech / ml-audiocaps
View on GitHub
Multi-lingual AudioCaps
☆14Nov 20, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
pengyizhou / FD-Bench
View on GitHub
☆24Aug 14, 2025Updated 11 months ago
Oscima2026 / china-patent-drafter-skill
View on GitHub
☆38May 15, 2026Updated 2 months ago
naver / poseembroider
View on GitHub
Code for paper "PoseEmbroider:Towards a 3D, Visual, Semantic-aware Human Pose Representation" (ECCV 2024)
☆18Nov 18, 2024Updated last year
RBenita / DIFFAR
View on GitHub
Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
☆32Mar 8, 2024Updated 2 years ago
csiebler / mlops-demo
View on GitHub
Demo for MLOps with Azure Machine Learning
☆11Jul 5, 2022Updated 4 years ago
weynechen / fastrtc-local-cn
View on GitHub
使用本地模型，实现实时AI语音对话，适用于中文环境。
☆19Jun 17, 2025Updated last year
facebookresearch / soundvista
View on GitHub
soundvista
☆16Dec 31, 2025Updated 6 months ago