daemyung / practice-triton
삼각형의 실전! Triton
☆15Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for practice-triton
- Flexibly track outputs and grad-outputs of torch.nn.Module.☆13Updated last year
- ☆149Updated this week
- A performance library for machine learning applications.☆180Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated last month
- ☆39Updated 10 months ago
- Pytorch/XLA SPMD Test code in Google TPU☆21Updated 7 months ago
- ☆77Updated 5 months ago
- BCQ tutorial for transformers☆16Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆112Updated 8 months ago
- Elixir: Train a Large Language Model on a Small GPU Cluster☆13Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆50Updated 7 months ago
- ☆100Updated last year
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆68Updated 3 months ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆29Updated 6 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆147Updated 4 months ago
- Transformers components but in Triton☆27Updated this week
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆86Updated 9 months ago
- ☆22Updated 10 months ago
- Lightning support for Intel Habana accelerators.☆25Updated this week
- ☆44Updated 11 months ago
- ☆74Updated 11 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Code for Palu: Compressing KV-Cache with Low-Rank Projection☆57Updated this week
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- ☆45Updated 9 months ago
- Calculating Expected Time for training LLM.☆38Updated last year