pytorch-labs / superblock
A block oriented training approach for inference time optimization.
☆30Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for superblock
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- This repository contains the experimental PyTorch native float8 training UX☆211Updated 3 months ago
- ☆88Updated 2 months ago
- extensible collectives library in triton☆72Updated last month
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- Fast Hadamard transform in CUDA, with a PyTorch interface☆111Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- ☆33Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- ☆77Updated 5 months ago
- ☆156Updated last year
- ☆23Updated 4 months ago
- LLM KV cache compression made easy☆64Updated last week
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- ☆55Updated 5 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆187Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Cataloging released Triton kernels.☆134Updated 2 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆79Updated 5 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated last month
- Megatron's multi-modal data loader☆136Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- ☆45Updated 2 weeks ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆112Updated 8 months ago
- ☆24Updated 7 months ago