facebookexperimental / protoquantLinks
Prototype routines for GPU quantization written using PyTorch.
☆21Updated this week
Alternatives and similar repositories for protoquant
Users that are interested in protoquant are comparing it to the libraries listed below
Sorting:
- Torch Distributed Experimental☆117Updated last year
- Experiment of using Tangent to autodiff triton☆82Updated 2 years ago
- A block oriented training approach for inference time optimization.☆34Updated last year
- Hacks for PyTorch☆19Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆226Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆164Updated last month
- PyTorch RFCs (experimental)☆138Updated 8 months ago
- ☆160Updated 2 years ago
- PyTorch centric eager mode debugger☆48Updated last year
- ☆21Updated 11 months ago
- ☆71Updated 10 months ago
- ☆28Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated 2 years ago
- TORCH_TRACE parser for PT2☆76Updated this week
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated 2 years ago
- extensible collectives library in triton☆95Updated 10 months ago
- ☆41Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46Updated 2 years ago
- ☆124Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆329Updated last week
- Repository for CPU Kernel Generation for LLM Inference☆28Updated 2 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- ☆115Updated last year
- A library for unit scaling in PyTorch☆133Updated 7 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆96Updated 4 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- Explore training for quantized models☆26Updated 7 months ago
- TorchFix - a linter for PyTorch-using code with autofix support☆152Updated 5 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆115Updated last year