facebookexperimental / protoquantLinks
Prototype routines for GPU quantization written using PyTorch.
☆21Updated 2 months ago
Alternatives and similar repositories for protoquant
Users that are interested in protoquant are comparing it to the libraries listed below
Sorting:
- Torch Distributed Experimental☆117Updated last year
 - Experiment of using Tangent to autodiff triton☆80Updated last year
 - PyTorch centric eager mode debugger☆48Updated 10 months ago
 - ☆158Updated 2 years ago
 - ☆21Updated 8 months ago
 - Hacks for PyTorch☆19Updated 2 years ago
 - A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆161Updated last month
 - This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
 - A block oriented training approach for inference time optimization.☆33Updated last year
 - PyTorch RFCs (experimental)☆135Updated 5 months ago
 - torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆181Updated 2 months ago
 - Memory Optimizations for Deep Learning (ICML 2023)☆110Updated last year
 - ☆28Updated 9 months ago
 - A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated 2 weeks ago
 - CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
 - CUDA implementation of autoregressive linear attention, with all the latest research findings☆45Updated 2 years ago
 - extensible collectives library in triton☆90Updated 7 months ago
 - Make triton easier☆48Updated last year
 - ☆121Updated last year
 - Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
 - ☆71Updated 7 months ago
 - ☆112Updated last year
 - TorchFix - a linter for PyTorch-using code with autofix support☆148Updated 2 months ago
 - FlexAttention w/ FlashAttention3 Support☆27Updated last year
 - APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆25Updated last week
 - Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
 - Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated last year
 - Benchmarks to capture important workloads.☆31Updated 9 months ago
 - A Python library transfers PyTorch tensors between CPU and NVMe☆120Updated 11 months ago
 - Context Manager to profile the forward and backward times of PyTorch's nn.Module☆82Updated 2 years ago