A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆595Aug 12, 2025Updated 6 months ago
Alternatives and similar repositories for attorch
Users that are interested in attorch are comparing it to the libraries listed below
Sorting:
- Cataloging released Triton kernels.☆296Sep 9, 2025Updated 5 months ago
- Puzzles for learning Triton☆2,324Nov 18, 2024Updated last year
- Tile primitives for speedy kernels☆3,202Feb 24, 2026Updated last week
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- Applied AI experiments and examples for PyTorch☆319Aug 22, 2025Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆268Oct 3, 2025Updated 5 months ago
- Collection of kernels written in Triton language☆178Jan 27, 2026Updated last month
- Efficient Triton Kernels for LLM Training☆6,189Updated this week
- Accelerated First Order Parallel Associative Scan☆195Jan 7, 2026Updated 2 months ago
- ☆301Updated this week
- Ring attention implementation with flash attention☆987Sep 10, 2025Updated 5 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,474Updated this week
- Helpful tools and examples for working with flex-attention☆1,140Feb 8, 2026Updated 3 weeks ago
- Fast low-bit matmul kernels in Triton☆436Feb 1, 2026Updated last month
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆327Updated this week
- A collection of memory efficient attention operators implemented in the Triton language.☆288Jun 5, 2024Updated last year
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,148Feb 23, 2026Updated last week
- A PyTorch native platform for training generative AI models☆5,098Feb 28, 2026Updated last week
- ☆105Nov 7, 2024Updated last year
- PyTorch native quantization and sparsity for training and inference☆2,707Updated this week
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,585Jan 28, 2026Updated last month
- train with kittens!☆63Oct 25, 2024Updated last year
- ☆115Aug 26, 2024Updated last year
- Annotated version of the Mamba paper☆497Feb 27, 2024Updated 2 years ago
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated 2 weeks ago
- ☆124May 28, 2024Updated last year
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- FlashInfer: Kernel Library for LLM Serving☆5,057Updated this week
- TensorDict is a pytorch dedicated tensor container.☆1,011Updated this week
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 3 months ago
- GPU programming related news and material links☆2,010Sep 17, 2025Updated 5 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆909Updated this week
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- UNet diffusion model in pure CUDA☆657Jun 28, 2024Updated last year
- flash attention tutorial written in python, triton, cuda, cutlass☆490Jan 20, 2026Updated last month
- ☆19Dec 4, 2025Updated 3 months ago