facebookresearch / dietgpuLinks

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

☆338

Alternatives and similar repositories for dietgpu

Users that are interested in dietgpu are comparing it to the libraries listed below

Sorting:

facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆391Updated this week
facebookresearch / diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …
☆235Updated 2 years ago
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆258Updated 2 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆269Updated 3 years ago
google / aqt
☆318Updated last week
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆704Updated last week
NVIDIA / PyProf
A GPU performance profiling tool for PyTorch models
☆503Updated 3 years ago
huggingface / pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
☆548Updated 4 years ago
facebookresearch / torchdim
Named tensors with first-class dimensions for PyTorch
☆332Updated 2 years ago
jax-ml / jax-triton
jax-triton contains integrations between JAX and OpenAI Triton
☆403Updated this week
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆265Updated 4 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆132Updated 3 years ago
facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …
☆194Updated 2 years ago
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated 2 weeks ago
spcl / substation
Research and development for optimizing transformers
☆129Updated 4 years ago
microsoft / varuna
☆250Updated 11 months ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated 10 months ago
jax-ml / ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
☆276Updated 3 weeks ago
pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,053Updated last year
utsaslab / MONeT
MONeT framework for reducing memory consumption of DNN training
☆173Updated 4 years ago
openxla / shardy
MLIR-based partitioning system
☆97Updated this week
facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆715Updated 2 years ago
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆821Updated this week
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆170Updated this week
ptillet / torch-blocksparse
Block-sparse primitives for PyTorch
☆156Updated 4 years ago
pytorch / nestedtensor
[Prototype] Tools for the concurrent manipulation of variably sized Tensors.
☆251Updated 2 years ago
pytorch / ort
Accelerate PyTorch models with ONNX Runtime
☆362Updated 4 months ago
TezRomacH / layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
☆107Updated 2 years ago
facebookresearch / NeuralCompression
A collection of tools for neural compression enthusiasts.
☆561Updated 9 months ago