Samples of good AI generated CUDA kernels
☆100May 30, 2025Updated 9 months ago
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below
Sorting:
- LLM as World Models using Bayesian inference☆16May 27, 2025Updated 9 months ago
- FPGA简单入门☆12Nov 17, 2020Updated 5 years ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆115Jun 14, 2025Updated 8 months ago
- Training AI for Super Smash Bros. Melee☆32Mar 27, 2025Updated 11 months ago
- A proof-of-concept for building Orbiter spaceflight simulator addons in Rust☆13Jan 30, 2022Updated 4 years ago
- TORCH_TRACE parser for PT2☆78Updated this week
- SDXL GPU cluster scripts☆16Oct 28, 2023Updated 2 years ago
- It is an LLM-based AI agent, which can write correct and efficient gpu kernels automatically.☆68Updated this week
- ☆14May 5, 2025Updated 9 months ago
- A package dedicated for running benchmark agreement testing☆17Sep 18, 2025Updated 5 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆16Mar 26, 2025Updated 11 months ago
- Repo for solving arc problems with an Neural Cellular Automata☆23May 21, 2025Updated 9 months ago
- ☆24Updated this week
- Implementation of FizzBuzz on an FPGA☆17Feb 26, 2018Updated 8 years ago
- ☆93Updated this week
- Parallelized 3D FDTD Schrödinger Equation Solver☆20Aug 16, 2018Updated 7 years ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆820Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆195Updated this week
- Halfedge mesh library in Rust☆28Nov 20, 2025Updated 3 months ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Feb 11, 2026Updated 2 weeks ago
- Revolutionary AI-powered 4K desktop wallpaper generator with DeepSeek-R1, FLUX-Dev, and 8K supersampling pipeline☆30Aug 6, 2025Updated 6 months ago
- Allows two LLMs to communicate and run code in the terminal☆28Dec 8, 2024Updated last year
- High-Performance FP32 GEMM on CUDA devices☆117Jan 21, 2025Updated last year
- A torch compile backend for multi-targets☆46Updated this week
- ☆27Jul 9, 2024Updated last year
- A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session.☆74Jun 1, 2025Updated 9 months ago
- Multi-Domain Expert Learning☆67Jan 23, 2024Updated 2 years ago
- HIP Python Low-level Bindings☆32Feb 12, 2026Updated 2 weeks ago
- A red teaming agent☆18Oct 15, 2025Updated 4 months ago
- ☆13Oct 5, 2025Updated 4 months ago
- Tile primitives for speedy kernels☆3,183Updated this week
- DietCode Code Release☆65Jul 21, 2022Updated 3 years ago
- GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks☆36Oct 1, 2023Updated 2 years ago
- We aim to redefine Data Parallel libraries portabiliy, performance, programability and maintainability, by using C++ standard features, i…☆51Updated this week
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 10 months ago
- ☆74Sep 5, 2023Updated 2 years ago
- ☆53Updated this week