Qualcomm-AI-research / fastforwardLinks
Neural network quantization for research and prototyping
☆41Updated 2 weeks ago
Alternatives and similar repositories for fastforward
Users that are interested in fastforward are comparing it to the libraries listed below
Sorting:
- A block oriented training approach for inference time optimization.☆34Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆233Updated 7 months ago
- Quantize transformers to any learned arbitrary 4-bit numeric format☆50Updated 2 weeks ago
- Memory Optimizations for Deep Learning (ICML 2023)☆115Updated last year
- ☆160Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Explore training for quantized models☆26Updated 6 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated this week
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated 2 years ago
- A library for unit scaling in PyTorch☆133Updated 7 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- Customized matrix multiplication kernels☆57Updated 3 years ago
- PyTorch centric eager mode debugger☆48Updated last year
- ☆169Updated 2 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆244Updated this week
- A bunch of kernels that might make stuff slower 😉☆75Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆64Updated 3 weeks ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 3 months ago
- Make triton easier☆50Updated last year
- Experiment of using Tangent to autodiff triton☆82Updated 2 years ago
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆29Updated 2 months ago
- ☆208Updated 4 years ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Updated 3 years ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆81Updated 6 months ago
- Hacks for PyTorch☆19Updated 2 years ago
- TorchFix - a linter for PyTorch-using code with autofix support☆152Updated 5 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆739Updated this week
- Our first fully AI generated deep learning system☆481Updated last week
- Patch convolution to avoid large GPU memory usage of Conv2D☆95Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆341Updated last year