facebookexperimental / protoquantLinks

Prototype routines for GPU quantization written using PyTorch.

☆21

Alternatives and similar repositories for protoquant

Users that are interested in protoquant are comparing it to the libraries listed below

Sorting:

lianakoleva / no-libtorch-compile
☆21Updated 7 months ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆108Updated last year
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
meta-pytorch / tlparse
TORCH_LOGS parser for PT2
☆61Updated 3 weeks ago
meta-pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated last month
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆82Updated 2 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 3 weeks ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
pytorch / rfcs
PyTorch RFCs (experimental)
☆135Updated 4 months ago
deepspeedai / DeepSpeed-Kernels
☆72Updated 6 months ago
ezyang / torchdbg
PyTorch centric eager mode debugger
☆48Updated 9 months ago
lernapparat / torchhacks
Hacks for PyTorch
☆19Updated 2 years ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 10 months ago
stanford-futuredata / stk
☆113Updated last year
UmerHA / triton_util
Make triton easier
☆47Updated last year
Jokeren / triton-samples
☆28Updated 8 months ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆84Updated 3 weeks ago
cchan / tccl
extensible collectives library in triton
☆89Updated 6 months ago
meta-pytorch / superblock
A block oriented training approach for inference time optimization.
☆34Updated last year
marsupialtail / sparsednn
Fast sparse deep learning on CPUs
☆56Updated 3 years ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆45Updated last month
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
meta-pytorch / torchfix
TorchFix - a linter for PyTorch-using code with autofix support
☆148Updated last month
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆60Updated this week
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago