yangluo7 / CAME
The official implementation of "CAME: Confidence-guided Adaptive Memory Optimization"
☆89Updated last month
Alternatives and similar repositories for CAME:
Users that are interested in CAME are comparing it to the libraries listed below
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆101Updated 9 months ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional var…☆126Updated 2 months ago
- ☆49Updated last year
- Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).☆74Updated 10 months ago
- [ICML2025] LoRA fine-tune directly on the quantized models.☆27Updated 5 months ago
- Minimal Differentiable Image Reward Functions☆55Updated 3 weeks ago
- ☆163Updated 3 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆64Updated 5 months ago
- SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training☆178Updated 3 months ago
- The official implementation of Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility☆47Updated 3 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆97Updated 3 weeks ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 3 months ago
- Code for NeurIPS 2023 paper "Restart Sampling for Improving Generative Processes"☆149Updated last year
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆23Updated 2 months ago
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆57Updated 2 months ago
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆45Updated 9 months ago
- Low-bit optimizers for PyTorch☆128Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆86Updated 3 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- Writing FLUX in Triton☆32Updated 7 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆57Updated last month
- PyTorch implementation for "Parallel Sampling of Diffusion Models", NeurIPS 2023 Spotlight☆136Updated last year
- A parallelism VAE avoids OOM for high resolution image generation☆61Updated 3 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆142Updated last month
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆35Updated 5 months ago
- Triton implement of bi-directional (non-causal) linear attention☆46Updated 3 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆59Updated 11 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆147Updated 6 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆71Updated 2 weeks ago
- ☆27Updated last year