daemyung / practice-tritonLinks
삼각형의 실전! Triton
☆16Updated last year
Alternatives and similar repositories for practice-triton
Users that are interested in practice-triton are comparing it to the libraries listed below
Sorting:
- Flexibly track outputs and grad-outputs of torch.nn.Module.☆13Updated last year
- Lightning support for Intel Habana accelerators.☆27Updated last month
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year
- Calculating Expected Time for training LLM.☆38Updated 2 years ago
- ☆100Updated last year
- ☆26Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 7 months ago
- ☆108Updated last year
- Transformers components but in Triton☆33Updated 3 weeks ago
- ☆15Updated 3 months ago
- OSLO: Open Source for Large-scale Optimization☆174Updated last year
- A performance library for machine learning applications.☆183Updated last year
- ☆13Updated 2 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆80Updated 3 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆62Updated 4 months ago
- Elixir: Train a Large Language Model on a Small GPU Cluster☆14Updated last year
- Awesome Triton Resources☆28Updated last month
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆33Updated last year
- Automatic differentiation for Triton Kernels☆11Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆79Updated last year
- ☆81Updated last year
- Load compute kernels from the Hub☆139Updated last week
- ☆60Updated 3 months ago
- Data processing system for polyglot☆91Updated last year
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆17Updated this week
- ☆78Updated 11 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆77Updated 10 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆133Updated this week
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20Updated last year