pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆1,592Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ao
- A native PyTorch Library for large model training☆2,635Updated this week
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,199Updated this week
- A pytorch quantization backend for optimum☆828Updated last week
- Tile primitives for speedy kernels☆1,661Updated this week
- Puzzles for learning Triton☆1,138Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆1,461Updated this week
- Minimalistic large language model 3D-parallelism training☆1,265Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆804Updated 3 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,982Updated this week
- Efficient Triton Kernels for LLM Training☆3,477Updated this week
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,346Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillati…☆573Updated this week
- Schedule-Free Optimization in PyTorch☆1,900Updated 2 weeks ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,534Updated last month
- Pipeline Parallelism for PyTorch☆725Updated 3 months ago
- TinyChatEngine: On-Device LLM Inference Library☆748Updated 4 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆702Updated this week
- Transform datasets at scale. Optimize datasets for fast AI model training.☆368Updated this week
- NanoGPT (124M) in 5 minutes☆1,269Updated this week
- PyTorch native finetuning library☆4,346Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆691Updated this week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,138Updated last month
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,393Updated this week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,436Updated 3 weeks ago
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆640Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆626Updated 7 months ago
- A throughput-oriented high-performance serving framework for LLMs☆637Updated 2 months ago