alexzhang13 / Triton-Puzzles-SolutionsView external linksLinks
Personal solutions to the Triton Puzzles
☆20Jul 18, 2024Updated last year
Alternatives and similar repositories for Triton-Puzzles-Solutions
Users that are interested in Triton-Puzzles-Solutions are comparing it to the libraries listed below
Sorting:
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Feb 9, 2026Updated last week
- Triton implementation of FlashAttention2 that adds Custom Masks.☆167Aug 14, 2024Updated last year
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 8 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Updated this week
- ☆21Mar 3, 2025Updated 11 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆73Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆72Updated this week
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 5 months ago
- ☆28Jan 17, 2025Updated last year
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆67Dec 16, 2025Updated 2 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆96Jan 8, 2026Updated last month
- Boltz-2 implementation for inference on Tenstorrent hardware☆71Updated this week
- Triton-based Symmetric Memory operators and examples☆81Jan 15, 2026Updated last month
- Train I3D on NTU-RGB+D dataset in keras☆12Feb 5, 2019Updated 7 years ago
- Official codebase for "Context Aware Deep Learning for Multi Modal Depression Detection" [ICASSP 2019, Oral]☆11Dec 26, 2024Updated last year
- ☆11Jun 15, 2019Updated 6 years ago
- Make triton easier☆50Jun 12, 2024Updated last year
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆50Aug 5, 2025Updated 6 months ago
- GPU-Accelerated Cosine Similarity for Tandem Mass Spectrometry☆17Nov 4, 2025Updated 3 months ago
- PaiNN in jax☆11Jan 14, 2025Updated last year
- ☆12Aug 26, 2025Updated 5 months ago
- OpenCode GUI extension for VSCode☆19Feb 7, 2026Updated last week
- ☆10Jul 28, 2021Updated 4 years ago
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 7 months ago
- Benchmarking scripts for Gaia☆13Apr 10, 2025Updated 10 months ago
- A simple Python library for compartment models☆11Aug 23, 2021Updated 4 years ago
- ☆36Oct 29, 2025Updated 3 months ago
- Implementation of various equivariant models in JAX☆12Apr 12, 2024Updated last year
- Clustered Compositional Embeddings☆11Oct 25, 2023Updated 2 years ago
- A Zen approach to configuring your Python project☆15Feb 5, 2026Updated last week
- Benchmark of glucose predictive models in diabetes☆11Nov 12, 2024Updated last year
- ☆14Mar 9, 2023Updated 2 years ago
- GeekGameBoard (GGB) is a small framework for building board and card games. It's based on Apple's Core Animation framework.☆21Mar 14, 2013Updated 12 years ago
- Pytorch routines for (Ker)nel (Mac)hines☆10Oct 10, 2025Updated 4 months ago
- 🦌 Deep Retention, Winner @ Calhacks ✨🌠☆10Oct 26, 2024Updated last year
- Official Pytorch implementation of Chromatic Graph Transformers☆10Jun 14, 2023Updated 2 years ago
- Conditional Linear Dynamical Systems☆15Oct 7, 2025Updated 4 months ago