A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.
☆34Jun 26, 2024Updated last year
Alternatives and similar repositories for small-vision
Users that are interested in small-vision are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity☆29Jul 14, 2025Updated 7 months ago
- ☆32Jul 29, 2024Updated last year
- [IEEE/CVF CVPR'2022] "ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation", Duolikun Danier, Fan Zhang, David Bull☆13Oct 9, 2023Updated 2 years ago
- ☆34May 14, 2025Updated 9 months ago
- Animatediff implementation. Includes a ControlNet pipeline.☆19Dec 24, 2023Updated 2 years ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated 8 months ago
- Official implementation for CVPR 2025 paper "AMO Sampler: Enhancing Text Rendering with Overshooting"☆30May 3, 2025Updated 9 months ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- [⭐️ WACV 2025 Oral ⭐️] PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition☆30Jun 9, 2025Updated 8 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- Code for Fast Training of Diffusion Models with Masked Transformers☆421May 15, 2024Updated last year
- Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?☆42Jul 26, 2025Updated 7 months ago
- ☆43May 10, 2025Updated 9 months ago
- The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (N…☆50Oct 23, 2023Updated 2 years ago
- ☆29Jan 15, 2025Updated last year
- Code for the paper DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents, ICML 2024☆93Jun 12, 2024Updated last year
- Generative Equilibrium Transformer☆27Nov 11, 2023Updated 2 years ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆227Mar 20, 2025Updated 11 months ago
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆66Jan 25, 2025Updated last year
- Scalable Diffusion Models with State Space Backbone☆157Mar 7, 2024Updated last year
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- [NeurIPS 2025] Official code for Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing☆72Oct 12, 2025Updated 4 months ago
- LCM Full Cycle Trainer for Ostris - Ai Toolkit☆16Aug 20, 2024Updated last year
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Apr 27, 2024Updated last year
- ☆27Jan 28, 2026Updated last month
- Github repository for the paper Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers.☆33Mar 17, 2025Updated 11 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆33Jun 30, 2025Updated 8 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆73Oct 21, 2025Updated 4 months ago
- Conformer-based Metric GAN for speech enhancement☆27May 3, 2024Updated last year
- The codebase of our paper "Improving the Training of Rectified Flows", NeurIPS 2024☆130Oct 18, 2024Updated last year
- Train VAE like a boss☆313Oct 21, 2024Updated last year
- A tool for benchmarking image generation models.☆33Jan 13, 2023Updated 3 years ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- The audio demos with respect to the paper "DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention tra…☆29Jul 25, 2022Updated 3 years ago
- Doohickey is a stable diffusion tool for technical artists who want to stay up-to-date with the latest developments in the field.☆40Dec 7, 2022Updated 3 years ago
- Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"☆166Jan 31, 2025Updated last year
- Writing FLUX in Triton☆42Sep 22, 2024Updated last year
- Consistency Models Made Easy☆325Oct 13, 2024Updated last year