Personal solutions to the Triton Puzzles
☆20Jul 18, 2024Updated last year
Alternatives and similar repositories for Triton-Puzzles-Solutions
Users that are interested in Triton-Puzzles-Solutions are comparing it to the libraries listed below
Sorting:
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Feb 9, 2026Updated last month
- Triton implementation of FlashAttention2 that adds Custom Masks.☆169Aug 14, 2024Updated last year
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 9 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- ☆301Updated this week
- ☆21Mar 3, 2025Updated last year
- Write a fast kernel and see how you compare against the best humans and AI on gpumode.com☆78Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆73Feb 28, 2026Updated last week
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 6 months ago
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆68Dec 16, 2025Updated 2 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆104Jan 8, 2026Updated 2 months ago
- ☆53Feb 24, 2026Updated last week
- Boltz-2 implementation for inference on Tenstorrent hardware☆73Updated this week
- Slimebound character mod for Slay the Spire☆14Jun 30, 2020Updated 5 years ago
- Train I3D on NTU-RGB+D dataset in keras☆12Feb 5, 2019Updated 7 years ago
- Triton-based Symmetric Memory operators and examples☆86Jan 15, 2026Updated last month
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- A creative coding environment where Claude can express itself through generative art using p5.js. See tweet thread for examples: https://…☆13Feb 3, 2026Updated last month
- PaiNN in jax☆11Jan 14, 2025Updated last year
- Official Pytorch implementation of Chromatic Graph Transformers☆10Jun 14, 2023Updated 2 years ago
- Jupyter notebooks from our weekly (or so) hackathons☆11Dec 3, 2024Updated last year
- ☆11Aug 21, 2023Updated 2 years ago
- OpenCode GUI extension for VSCode☆22Feb 11, 2026Updated 3 weeks ago
- Triton‑style kernel toolkit for MLX plus a small upstream incubator: prototype, benchmark, and upstream fusions for Apple Silicon☆36Updated this week
- Clustered Compositional Embeddings☆11Oct 25, 2023Updated 2 years ago
- [ICML 2022] "Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness" by Tianlong Chen*, Huan Zhang*, Zhenyu Zhang, Shiyu…☆17Jun 22, 2022Updated 3 years ago
- Benchmark of glucose predictive models in diabetes☆11Nov 12, 2024Updated last year
- ☆14Mar 9, 2023Updated 2 years ago
- ☆23Jul 11, 2025Updated 7 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- A simple library-less CUDA implementation of the OneSweep sorting algorithm.☆11Feb 26, 2024Updated 2 years ago
- ☆10May 1, 2023Updated 2 years ago
- Official codebase for "Context Aware Deep Learning for Multi Modal Depression Detection" [ICASSP 2019, Oral]☆11Dec 26, 2024Updated last year
- ☆12Aug 26, 2025Updated 6 months ago
- GPU-Accelerated Cosine Similarity for Tandem Mass Spectrometry☆18Nov 4, 2025Updated 4 months ago
- [TMLR 2025] Stability-Aware Training of Machine Learning Force Fields with Differentiable Boltzmann Estimators☆16Nov 20, 2025Updated 3 months ago
- GeekGameBoard (GGB) is a small framework for building board and card games. It's based on Apple's Core Animation framework.☆21Mar 14, 2013Updated 12 years ago
- Pytorch routines for (Ker)nel (Mac)hines☆11Oct 10, 2025Updated 4 months ago
- Silly twitter torch implementations.☆46Oct 14, 2022Updated 3 years ago