Qualcomm-AI-research / fastforwardLinks
Neural network quantization for research and prototyping
☆27Updated last week
Alternatives and similar repositories for fastforward
Users that are interested in fastforward are comparing it to the libraries listed below
Sorting:
- A library for unit scaling in PyTorch☆129Updated last month
- A block oriented training approach for inference time optimization.☆33Updated 11 months ago
- Customized matrix multiplication kernels☆56Updated 3 years ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆326Updated 7 months ago
- ☆159Updated last year
- ☆36Updated 8 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- ☆76Updated 3 years ago
- ☆154Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆215Updated 2 years ago
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆95Updated 3 years ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated last week
- Patch convolution to avoid large GPU memory usage of Conv2D☆92Updated 6 months ago
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆99Updated last year
- A research library for pytorch-based neural network pruning, compression, and more.☆162Updated 2 years ago
- A simple minimal implementation of Reversible Vision Transformers☆125Updated last year
- Official implementation for Wavelet Feature Maps Compression for Image-to-Image CNNs, NeurIPS 2022.☆35Updated 2 years ago
- ☆210Updated 2 years ago
- Dynamic Neural Architecture Search Toolkit☆30Updated 8 months ago
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆73Updated 2 years ago
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆31Updated 2 years ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆57Updated 2 weeks ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆64Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆102Updated last year
- ☆51Updated last year
- Post-training sparsity-aware quantization☆34Updated 2 years ago
- Implementation of "Gradients without backpropagation" paper (https://arxiv.org/abs/2202.08587) using functorch☆110Updated 2 years ago
- ☆19Updated 3 years ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆217Updated last year
- ☆43Updated last year