Samples of good AI generated CUDA kernels
☆105May 30, 2025Updated last year
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automated High-Performance GPU Kernel Generation☆114Jun 1, 2026Updated last week
- LLM as World Models using Bayesian inference☆20May 27, 2025Updated last year
- Generating Efficient AI-Centric Kernels☆104Updated this week
- ☆21May 13, 2022Updated 4 years ago
- TORCH_TRACE parser for PT2☆86May 11, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆132Jun 14, 2025Updated 11 months ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆18Mar 26, 2025Updated last year
- Optimizing diffusion for production-ready speeds☆40Jan 10, 2026Updated 5 months ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆30Apr 9, 2026Updated 2 months ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,045Mar 24, 2026Updated 2 months ago
- Official Repo of CudaForge☆83Dec 2, 2025Updated 6 months ago
- ☆40Jun 3, 2026Updated last week
- ☆26Jun 5, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Development containers for triton and triton-cpu☆28Jun 3, 2026Updated last week
- An awesome list that curates the best Flet tools, tutorials, blogs and more.☆10Jan 8, 2023Updated 3 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆93Sep 12, 2025Updated 9 months ago
- The official implementation of "EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models"☆21Jul 8, 2025Updated 11 months ago
- Utility that parses stack sizes section from elf objects and displays the preallocated stack size of each function.☆14Jan 15, 2020Updated 6 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆210Updated this week
- ☆42Sep 8, 2023Updated 2 years ago
- ☆50Jan 28, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- smolbox of recipies☆29Apr 23, 2025Updated last year
- // clone this repo with --depth=1 to save disk size // toolchain compatible with Ubuntu 20.04+ //☆15Apr 28, 2022Updated 4 years ago
- plane sweep filtering genome alignments☆24May 23, 2026Updated 2 weeks ago
- Revolutionary AI-powered 4K desktop wallpaper generator with DeepSeek-R1, FLUX-Dev, and 8K supersampling pipeline☆33Aug 6, 2025Updated 10 months ago
- This repo contains the benchmarks for Enzyme on GPU's☆11May 28, 2026Updated 2 weeks ago
- Demo project for PistonDevelopers/skeletal_animation☆22Nov 14, 2023Updated 2 years ago
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- Benchmarking LLMs on Typst☆21May 26, 2025Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆72Nov 21, 2016Updated 9 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Training AI for Super Smash Bros. Melee☆34Jun 4, 2026Updated last week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆155May 10, 2025Updated last year
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- High-Performance FP32 GEMM on CUDA devices☆125Jan 21, 2025Updated last year
- Tile primitives for speedy kernels☆3,420May 27, 2026Updated 2 weeks ago
- GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks☆36Oct 1, 2023Updated 2 years ago
- OnePlus 8T Param Read/Write☆14Dec 4, 2020Updated 5 years ago