Triton Compiler related materials.
☆42Jan 4, 2025Updated last year
Alternatives and similar repositories for triton-learning-materials
Users that are interested in triton-learning-materials are comparing it to the libraries listed below
Sorting:
- tutorials about polyhedral compilation.☆61Feb 9, 2026Updated last month
- let coding agents use ncu skills analysis cuda program automatically!☆47Feb 5, 2026Updated last month
- Tiny Container Engine☆11Jan 16, 2023Updated 3 years ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆71Feb 18, 2026Updated 2 weeks ago
- ☆85Apr 18, 2025Updated 10 months ago
- ☆26Aug 28, 2024Updated last year
- Go framework for DL model inference and API deployment☆51Dec 16, 2024Updated last year
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- Development repository for the Triton-Linalg conversion☆215Feb 7, 2025Updated last year
- A tool to count operators and parameters of your MXNet-Gluon model.☆22Apr 15, 2020Updated 5 years ago
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆55Mar 1, 2026Updated last week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆464Mar 10, 2025Updated 11 months ago
- ☆42Nov 1, 2025Updated 4 months ago
- Some common CUDA kernel implementations (Not the fastest).☆29Dec 5, 2025Updated 3 months ago
- How to optimize sgemm in single-thread ARM cpu, mutli-threads ARM cpu and Nvidia gpu☆23Jun 29, 2021Updated 4 years ago
- ☆120Apr 2, 2025Updated 11 months ago
- ☆27May 27, 2024Updated last year
- SaccadeNet : mimic how human locate accurate bounding box☆29Jul 10, 2019Updated 6 years ago
- Ahead of Time (AOT) Triton Math Library☆93Updated this week
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 7 months ago
- ☆97Mar 26, 2025Updated 11 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆84Mar 20, 2023Updated 2 years ago
- For Vast.ai hosts. Prometheus exporter reporting data from your Vast.ai account.☆12Updated this week
- ☆23Jan 27, 2014Updated 12 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- ☆116May 16, 2025Updated 9 months ago
- ☆54Mar 15, 2025Updated 11 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆909Updated this week
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆58Aug 12, 2024Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- Shared Middle-Layer for Triton Compilation☆331Dec 5, 2025Updated 3 months ago
- ☆11Sep 21, 2022Updated 3 years ago
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- A Guide for Encode Categorical Variables, with implementations and examples in Python.☆11Sep 9, 2020Updated 5 years ago
- https://nnsmith-asplos.rtfd.io Artifact of "NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers" ASPLOS'23☆11Mar 29, 2023Updated 2 years ago
- LaTex template for ITMO style presentations☆10Jan 19, 2025Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- GEMM☆10Aug 26, 2023Updated 2 years ago