GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
☆374Jan 12, 2026Updated last month
Alternatives and similar repositories for dietgpu
Users that are interested in dietgpu are comparing it to the libraries listed below
Sorting:
- Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloade…☆614Sep 11, 2024Updated last year
- A library for distributed ML training with PyTorch☆366Dec 12, 2022Updated 3 years ago
- ☆15Aug 3, 2021Updated 4 years ago
- Universal Python binding for the LMDB 'Lightning' Database☆13Nov 7, 2017Updated 8 years ago
- A case study of efficient training of large language models using commodity hardware.☆68Aug 4, 2022Updated 3 years ago
- ☆15Jun 10, 2022Updated 3 years ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,075Apr 17, 2024Updated last year
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- Pedagogical codebase for a simplified score-based generative model design, with training loop☆40Aug 28, 2021Updated 4 years ago
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆81Mar 17, 2022Updated 3 years ago
- Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".☆121Aug 19, 2022Updated 3 years ago
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 7 months ago
- PyTorch examples powered by Lightning☆12Dec 28, 2022Updated 3 years ago
- A Z-order (Morton-code) like coordinate system as template library for arbitrary dimensions.☆12Nov 24, 2019Updated 6 years ago
- Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64☆19Sep 3, 2024Updated last year
- Code for reproducing the experiments on large-scale pre-training and transfer learning for the paper "Effect of large-scale pre-training …☆19May 29, 2022Updated 3 years ago
- PyTorch extensions for high performance and large scale training.☆3,400Apr 26, 2025Updated 10 months ago
- An open source implementation of CLIP.☆33Nov 7, 2022Updated 3 years ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆379Updated this week
- ☆11Oct 3, 2021Updated 4 years ago
- Official Pytorch Implementation for the paper 'SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients'☆17Jan 12, 2022Updated 4 years ago
- Python Research Framework☆106Nov 3, 2022Updated 3 years ago
- ☆21Mar 15, 2023Updated 2 years ago
- Majesty Diffusion by @Dango233 and @apolinario (@multimodalart)☆25Jul 26, 2022Updated 3 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆300Jun 7, 2025Updated 8 months ago
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,534Feb 24, 2026Updated last week
- ☆14May 3, 2022Updated 3 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Mar 1, 2018Updated 8 years ago
- Simple repository contribution statistics☆15Updated this week
- ☆13Jun 25, 2022Updated 3 years ago
- OSLO: Open Source framework for Large-scale model Optimization☆309Aug 25, 2022Updated 3 years ago
- ☆251Jul 25, 2024Updated last year
- New generation entropy codecs : Finite State Entropy and Huff0☆1,470Mar 21, 2024Updated last year
- Triton Server Component for lightning.ai☆14Feb 15, 2023Updated 3 years ago
- Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.☆1,829Jan 18, 2026Updated last month
- maximal update parametrization (µP)☆1,686Jul 17, 2024Updated last year
- Pipeline Parallelism for PyTorch☆785Aug 21, 2024Updated last year
- ☆27Mar 13, 2021Updated 4 years ago