daemyung / practice-triton
삼각형의 실전! Triton
☆16Updated last year
Alternatives and similar repositories for practice-triton:
Users that are interested in practice-triton are comparing it to the libraries listed below
- Transformers components but in Triton☆32Updated last month
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 6 months ago
- ☆43Updated last year
- A performance library for machine learning applications.☆184Updated last year
- ☆26Updated last year
- Elixir: Train a Large Language Model on a Small GPU Cluster☆14Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆82Updated last year
- Flexibly track outputs and grad-outputs of torch.nn.Module.☆13Updated last year
- Load compute kernels from the Hub☆115Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆117Updated last year
- ring-attention experiments☆130Updated 6 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆116Updated this week
- DPO, but faster 🚀☆41Updated 4 months ago
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year
- ☆13Updated last month
- ☆101Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- Mixed precision training from scratch with Tensors and CUDA☆22Updated 11 months ago
- Awesome Triton Resources☆26Updated 3 weeks ago
- Make triton easier☆47Updated 10 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆127Updated last week
- ☆14Updated 2 months ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆33Updated last year
- Lightning support for Intel Habana accelerators.☆27Updated 2 weeks ago
- Implementation of Infini-Transformer in Pytorch☆110Updated 3 months ago
- OSLO: Open Source for Large-scale Optimization☆175Updated last year
- ☆13Updated last month
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆65Updated last week
- Vocabulary Parallelism☆17Updated last month