npurson / fid-metrics
A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
☆16Updated last month
Alternatives and similar repositories for fid-metrics:
Users that are interested in fid-metrics are comparing it to the libraries listed below
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆43Updated 5 months ago
- Implementation of the paper "MaskBit: Embedding-free Image Generation from Bit Tokens"☆46Updated 2 weeks ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆77Updated 2 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆44Updated last week
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆83Updated last year
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆40Updated 2 months ago
- official code for Diff-Instruct algorithm for one-step diffusion distillation☆68Updated last month
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆137Updated 7 months ago
- ☆27Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆79Updated 7 months ago
- ☆113Updated 7 months ago
- Implementation of InstructEdit☆71Updated last year
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆23Updated last year
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆33Updated 7 months ago
- ☆123Updated this week
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆62Updated 4 months ago
- Towards training VQ-VAE models robustly!☆50Updated last month
- [Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions☆30Updated last week
- ☆107Updated 11 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆24Updated last month
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representations☆135Updated 7 months ago
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"☆28Updated 11 months ago
- [NeurIPS 24] Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models☆36Updated 4 months ago
- ☆41Updated 2 months ago
- [ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization☆126Updated 8 months ago
- ☆43Updated 5 months ago
- Score identity Distillation with Long and Short Guidance for One-Step Text-to-Image Generation☆47Updated last month
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆102Updated 4 months ago
- ☆28Updated 3 months ago