zugexiaodui / torch_flops
A library for calculating the FLOPs in the forward() process based on torch.fx
☆106Updated last month
Alternatives and similar repositories for torch_flops:
Users that are interested in torch_flops are comparing it to the libraries listed below
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆86Updated last year
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆100Updated last month
- ☆189Updated last year
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆142Updated last month
- [CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer☆68Updated last year
- Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)☆137Updated 2 years ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆59Updated 11 months ago
- Awesome list of papers that extend Mamba to various applications.☆132Updated last month
- [CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆48Updated 8 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆80Updated last month
- [CVPR 2023 Highlight] This is the official implementation of "Stitchable Neural Networks".☆248Updated 2 years ago
- Curated list of methods that focuses on improving the efficiency of diffusion models☆44Updated 10 months ago
- PyTorch code for our paper "ARB-LLM: Alternating Refined Binarizations for Large Language Models"☆24Updated last month
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆158Updated 7 months ago
- ☆163Updated 3 months ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.☆101Updated 4 months ago
- [CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Mo…☆63Updated 9 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆101Updated 9 months ago
- A sparse attention kernel supporting mix sparse patterns☆200Updated 2 months ago
- Causal depthwise conv1d in CUDA, with a PyTorch interface☆443Updated 5 months ago
- [NeurIPS 2023] Structural Pruning for Diffusion Models☆189Updated 10 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- [TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…☆44Updated 7 months ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆284Updated 2 months ago
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆28Updated 6 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆165Updated 11 months ago
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆22Updated 3 months ago
- The official implementation of the NeurIPS 2022 paper Q-ViT.☆88Updated last year
- ☆41Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆86Updated 3 months ago