pytorch-labs / superblock
A block oriented training approach for inference time optimization.
☆32Updated 7 months ago
Alternatives and similar repositories for superblock:
Users that are interested in superblock are comparing it to the libraries listed below
- ☆157Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- ☆65Updated 2 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- ☆101Updated 7 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆152Updated 10 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆84Updated 2 months ago
- ☆39Updated 8 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- ☆28Updated 11 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆104Updated 5 months ago
- Work in progress.☆50Updated last week
- ☆95Updated 9 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆234Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆77Updated 4 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆28Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆59Updated 5 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆101Updated 8 months ago
- ☆36Updated 4 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆157Updated 8 months ago
- ☆46Updated last week
- ☆73Updated 4 months ago
- ☆141Updated 2 years ago
- ☆202Updated 3 years ago