Samples of good AI generated CUDA kernels
☆103May 30, 2025Updated 11 months ago
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automated High-Performance GPU Kernel Generation☆101Apr 20, 2026Updated last week
- LLM as World Models using Bayesian inference☆17May 27, 2025Updated 11 months ago
- TORCH_TRACE parser for PT2☆85Updated this week
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆17Mar 26, 2025Updated last year
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official Repo of CudaForge☆76Dec 2, 2025Updated 5 months ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆959Mar 24, 2026Updated last month
- ☆35Updated this week
- ☆24Updated this week
- Development containers for triton and triton-cpu☆27Updated this week
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆31Mar 20, 2026Updated last month
- An awesome list that curates the best Flet tools, tutorials, blogs and more.☆10Jan 8, 2023Updated 3 years ago
- ☆13Oct 17, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official implementation of "EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models"☆21Jul 8, 2025Updated 9 months ago
- Utility that parses stack sizes section from elf objects and displays the preallocated stack size of each function.☆14Jan 15, 2020Updated 6 years ago
- SDXL GPU cluster scripts☆16Oct 28, 2023Updated 2 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆203Updated this week
- ☆42Sep 8, 2023Updated 2 years ago
- ☆69Jun 16, 2021Updated 4 years ago
- ☆50Jan 28, 2025Updated last year
- // clone this repo with --depth=1 to save disk size // toolchain compatible with Ubuntu 20.04+ //☆15Apr 28, 2022Updated 4 years ago
- plane sweep filtering genome alignments☆24Apr 18, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An insanely secure password manager.☆17Mar 10, 2026Updated last month
- A torch compile backend for multi-targets☆49Apr 2, 2026Updated last month
- Benchmarking LLMs on Typst☆20May 26, 2025Updated 11 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆56Apr 4, 2025Updated last year
- Training AI for Super Smash Bros. Melee☆34Mar 27, 2025Updated last year
- ☆22Apr 17, 2026Updated 2 weeks ago
- Tile primitives for speedy kernels☆3,326Apr 25, 2026Updated last week
- High-Performance FP32 GEMM on CUDA devices☆122Jan 21, 2025Updated last year
- Allows two LLMs to communicate and run code in the terminal☆28Dec 8, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Python utility to convert PyTorch model weights from '.bin' to '.safetensors' format.☆18Sep 19, 2025Updated 7 months ago
- vLLM Daily Summarization of Merged PRs☆50Updated this week
- Lightweight Llama 3 8B Inference Engine in CUDA C☆54Mar 21, 2025Updated last year
- itertree python package - full featured tree data structure☆15Sep 8, 2025Updated 7 months ago
- An Xposed/LSPosed module for disabling the annoying biometrics timeout☆20Aug 24, 2025Updated 8 months ago
- ☆12Apr 4, 2024Updated 2 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 3 years ago