Samples of good AI generated CUDA kernels
☆105May 30, 2025Updated last year
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automated High-Performance GPU Kernel Generation☆119Jun 1, 2026Updated last month
- Generating Efficient AI-Centric Kernels☆108Updated this week
- ☆21May 13, 2022Updated 4 years ago
- TORCH_TRACE parser for PT2☆87May 11, 2026Updated last month
- Optimizing diffusion for production-ready speeds☆40Jan 10, 2026Updated 5 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 7 years ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆31Updated this week
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆16Apr 30, 2025Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,080Mar 24, 2026Updated 3 months ago
- Official Repo of CudaForge☆84Dec 2, 2025Updated 7 months ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- ☆26Jun 8, 2026Updated 3 weeks ago
- Development containers for triton and triton-cpu☆28Jun 23, 2026Updated last week
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆92Sep 12, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [TIP 2026] The official implementation of "EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models"☆21Jul 8, 2025Updated 11 months ago
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆35Mar 20, 2026Updated 3 months ago
- Utility that parses stack sizes section from elf objects and displays the preallocated stack size of each function.☆14Jan 15, 2020Updated 6 years ago
- SDXL GPU cluster scripts☆16Oct 28, 2023Updated 2 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆212Updated this week
- ☆42Sep 8, 2023Updated 2 years ago
- ☆69Jun 16, 2021Updated 5 years ago
- ☆51Jan 28, 2025Updated last year
- // clone this repo with --depth=1 to save disk size // toolchain compatible with Ubuntu 20.04+ //☆15Apr 28, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- plane sweep filtering genome alignments☆24May 23, 2026Updated last month
- An insanely secure password manager.☆17Mar 10, 2026Updated 3 months ago
- This repo contains the benchmarks for Enzyme on GPU's☆11May 28, 2026Updated last month
- A torch compile backend for multi-targets☆50May 27, 2026Updated last month
- Open Source Replication of Anthropic's Alignment Faking Paper☆58Apr 4, 2025Updated last year
- Training AI for Super Smash Bros. Melee☆36Jun 18, 2026Updated last week
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- High-Performance FP32 GEMM on CUDA devices☆126Jan 21, 2025Updated last year
- Tile primitives for speedy kernels☆3,497Jun 15, 2026Updated 2 weeks ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆15May 5, 2025Updated last year
- Allows two LLMs to communicate and run code in the terminal☆28Dec 8, 2024Updated last year
- Python utility to convert PyTorch model weights from '.bin' to '.safetensors' format.☆18Sep 19, 2025Updated 9 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆54Mar 21, 2025Updated last year
- itertree python package - full featured tree data structure☆15Sep 8, 2025Updated 9 months ago
- Code related to the ELM neuron.☆15Feb 27, 2024Updated 2 years ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆15Sep 18, 2020Updated 5 years ago