amazon-science / avgen-eval-toolkitLinks
☆11Updated last week
Alternatives and similar repositories for avgen-eval-toolkit
Users that are interested in avgen-eval-toolkit are comparing it to the libraries listed below
Sorting:
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆84Updated last year
- code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection☆36Updated 2 years ago
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆52Updated 9 months ago
- ☆34Updated 2 weeks ago
- ☆21Updated last year
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆108Updated last year
- DREAM: Diffusion Rectification and Estimation-Adaptive Models (CVPR 2024)☆40Updated 4 months ago
- Official implementation for CVPR 2025 paper "AMO Sampler: Enhancing Text Rendering with Overshooting"☆20Updated last month
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆64Updated 4 months ago
- ☆46Updated 11 months ago
- [ICLR2022] Code for "Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph"☆54Updated 2 years ago
- Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"☆42Updated 6 months ago
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 3 years ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆23Updated last year
- The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation☆30Updated last month
- VQVAE for video prediction☆27Updated 3 years ago
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆49Updated 6 months ago
- ☆124Updated last year
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆86Updated last year
- official implementation of the paper: Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transform…☆29Updated 2 years ago
- Code for Novel View Acoustic Synthesis paper☆48Updated last year
- ☆48Updated 3 months ago
- AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model☆29Updated this week
- The project page repo for Neural Dubber.☆30Updated last year
- ☆32Updated 2 years ago
- Unofficial Implementation of E-LatentLPIPS(Ensembled-LatentLPIPS) of Diffusion2GAN☆40Updated 11 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆34Updated 5 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆100Updated last month
- (wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.☆27Updated 2 years ago
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆117Updated 8 months ago