Samples of good AI generated CUDA kernels
☆104May 30, 2025Updated 11 months ago
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automated High-Performance GPU Kernel Generation☆106Apr 20, 2026Updated last month
- Generating Efficient AI-Centric Kernels☆98Updated this week
- ☆21May 13, 2022Updated 4 years ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆128Jun 14, 2025Updated 11 months ago
- Optimizing diffusion for production-ready speeds☆39Jan 10, 2026Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆30Apr 9, 2026Updated last month
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,020Mar 24, 2026Updated last month
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- 模型加速/模型压缩(已完成所有Lab)☆11Dec 24, 2023Updated 2 years ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆93Sep 12, 2025Updated 8 months ago
- Utility that parses stack sizes section from elf objects and displays the preallocated stack size of each function.☆14Jan 15, 2020Updated 6 years ago
- SDXL GPU cluster scripts☆16Oct 28, 2023Updated 2 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆207May 14, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆42Sep 8, 2023Updated 2 years ago
- ☆50Jan 28, 2025Updated last year
- plane sweep filtering genome alignments☆24Apr 18, 2026Updated last month
- This repo contains the benchmarks for Enzyme on GPU's☆11Updated this week
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆72Nov 21, 2016Updated 9 years ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆58Apr 4, 2025Updated last year
- Training AI for Super Smash Bros. Melee☆34Updated this week
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- High-Performance FP32 GEMM on CUDA devices☆125Jan 21, 2025Updated last year
- Tile primitives for speedy kernels☆3,360May 11, 2026Updated last week
- GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks☆36Oct 1, 2023Updated 2 years ago
- Code related to the ELM neuron.☆15Feb 27, 2024Updated 2 years ago
- ☆106Mar 6, 2026Updated 2 months ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 3 years ago
- ☆12Oct 19, 2014Updated 11 years ago
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- ☆11Aug 4, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- An Xposed/LSPosed module for disabling the annoying biometrics timeout☆20Aug 24, 2025Updated 8 months ago
- Source code for the paper "Do Deep Neural Network Solutions form a Star Domain?"☆12May 26, 2024Updated last year
- ☆11Nov 13, 2020Updated 5 years ago
- DietCode Code Release☆65Jul 21, 2022Updated 3 years ago
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆57Feb 10, 2025Updated last year
- Markdown Preview Enhanced 打印主题☆15Apr 10, 2023Updated 3 years ago
- Speed of Light Analysis for ML Model Runtime☆66Apr 13, 2026Updated last month