facebookresearch / dietgpu
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
☆322Updated 3 months ago
Alternatives and similar repositories for dietgpu:
Users that are interested in dietgpu are comparing it to the libraries listed below
- A GPU performance profiling tool for PyTorch models☆504Updated 3 years ago
- ☆288Updated last week
- A library to analyze PyTorch traces.☆342Updated this week
- A library of GPU kernels for sparse matrix operations.☆260Updated 4 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆253Updated 2 years ago
- jax-triton contains integrations between JAX and OpenAI Triton☆382Updated this week
- Fast Block Sparse Matrices for Pytorch☆546Updated 4 years ago
- An open-source efficient deep learning framework/compiler, written in python.☆688Updated 2 weeks ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Updated 2 months ago
- DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …☆235Updated last year
- Implementation of a Transformer, but completely in Triton☆259Updated 2 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆154Updated 3 months ago
- PyTorch RFCs (experimental)☆130Updated 6 months ago
- Customized matrix multiplication kernels☆53Updated 3 years ago
- ☆105Updated last week
- Python bindings for NVTX☆66Updated last year
- Block-sparse primitives for PyTorch☆153Updated 3 years ago
- ☆184Updated 2 weeks ago
- Named tensors with first-class dimensions for PyTorch☆321Updated last year
- Lightweight and Parallel Deep Learning Framework☆264Updated 2 years ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆349Updated this week
- This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …☆193Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆221Updated 7 months ago
- ☆157Updated last year
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- oneCCL Bindings for Pytorch*☆89Updated this week
- Convert nvprof profiles into about:tracing compatible JSON files☆68Updated 3 years ago
- Torch Distributed Experimental☆115Updated 7 months ago
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆776Updated this week
- Research and development for optimizing transformers☆125Updated 4 years ago