mit-han-lab/patch_conv

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mit-han-lab/patch_conv)

mit-han-lab / patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D

☆95

Alternatives and similar repositories for patch_conv

Users that are interested in patch_conv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mit-han-lab / distrifuser
View on GitHub
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
☆727Dec 2, 2024Updated last year
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
sekstini / basedxl
View on GitHub
☆18Mar 18, 2024Updated 2 years ago
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
xdit-project / DistVAE
View on GitHub
A parallelism VAE avoids OOM for high resolution image generation
☆95May 8, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Nov 18, 2024Updated last year
zhuzilin / pytorch-malloc
View on GitHub
An external memory allocator example for PyTorch.
☆16Aug 10, 2025Updated 11 months ago
NVlabs / COAT
View on GitHub
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆263Aug 9, 2025Updated 11 months ago
cicero225 / llm_pokemon_scaffold
View on GitHub
☆34May 31, 2025Updated last year
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
z-lab / sparselora
View on GitHub
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆76Mar 10, 2026Updated 4 months ago
thu-nics / DiTFastAttn
View on GitHub
☆192Jan 14, 2025Updated last year
feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Xiuyu-Li / q-diffusion
View on GitHub
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
☆377Mar 21, 2024Updated 2 years ago
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,036Sep 10, 2025Updated 10 months ago
mit-han-lab / radial-attention
View on GitHub
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
☆603Nov 11, 2025Updated 8 months ago
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆154Dec 4, 2024Updated last year
svg-project / Sparse-VideoGen
View on GitHub
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
☆693Jul 4, 2026Updated 2 weeks ago
zhuzilin / vllm-group
View on GitHub
☆12Nov 5, 2024Updated last year
NoakLiu / FastCache-xDiT
View on GitHub
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
☆52Apr 29, 2026Updated 2 months ago
thib-s / flash-newton-schulz
View on GitHub
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
☆38Apr 30, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,495Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
richardodliu / OpenCodeEval
View on GitHub
☆52Mar 9, 2026Updated 4 months ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆885Updated this week
xxyux / SpInfer
View on GitHub
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆68Mar 25, 2025Updated last year
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,435Updated this week
mit-han-lab / x-attention
View on GitHub
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆280Jul 6, 2025Updated last year
meta-pytorch / applied-ai
View on GitHub
Applied AI experiments and examples for PyTorch
☆321Aug 22, 2025Updated 10 months ago
abcdabcd987 / LLIRInterpreter
View on GitHub
Single file interpreter (or naive virtual machine) for my intermediate representation. SSA support has been added.
☆15Apr 27, 2016Updated 10 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
zhijian-liu / torchpack
View on GitHub
A neural network training interface based on PyTorch, with a focus on flexibility
☆63Jan 17, 2024Updated 2 years ago
lightmatter-ai / INT-FP-QSim
View on GitHub
Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.
☆51Jul 10, 2023Updated 3 years ago
zhuzilin / chatgpt-desktop
View on GitHub
Desktop version of ChatGPT, support manually set cookie
☆19Dec 9, 2022Updated 3 years ago
tile-ai / tvm
View on GitHub
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆20Jul 13, 2026Updated last week
lmxyy / sige
View on GitHub
[NeurIPS 2022, T-PAMI 2023] Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
☆267Mar 18, 2024Updated 2 years ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,064Updated this week
xdit-project / xDiT
View on GitHub
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆2,660Jul 14, 2026Updated last week