snap-research / AVLinkLinks
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
☆16Updated 6 months ago
Alternatives and similar repositories for AVLink
Users that are interested in AVLink are comparing it to the libraries listed below
Sorting:
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆25Updated last year
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆41Updated last week
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Updated last year
- Music production for silent film clips.☆32Updated 9 months ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆14Updated 10 months ago
- ☆30Updated 10 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆88Updated last year
- [ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model☆53Updated 3 months ago
- [ICML2023] Long-Term Rhythmic Video Soundtracker☆61Updated 6 months ago
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆57Updated last year
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆29Updated 3 weeks ago
- [🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …☆28Updated 3 months ago
- ☆24Updated last year
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Updated 2 years ago
- ☆77Updated 9 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆122Updated 5 months ago
- Explore how to get a VQ-VAE models efficiently!☆67Updated 6 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆34Updated 10 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆72Updated 3 months ago
- Official Implementation (Pytorch) of "Constant Acceleration Flow", NeurIPS 2024☆35Updated 3 months ago
- Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"☆43Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆46Updated 4 months ago
- ☆58Updated last year
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Updated 2 years ago
- LVAS-Agent Code Base☆22Updated 9 months ago
- ☆49Updated 9 months ago
- official code for CVPR'24 paper Diff-BGM☆72Updated last year
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆75Updated 4 months ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆80Updated 9 months ago