Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
☆1,415Mar 19, 2026Updated 3 months ago
Alternatives and similar repositories for autokernel
Users that are interested in autokernel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM4Kernel: A Survey of Large Language Models for GPU Kernel Development☆73Mar 31, 2026Updated 2 months ago
- A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …☆177Apr 22, 2026Updated last month
- Triton kernels for Flux☆23Jul 7, 2025Updated 11 months ago
- ☆293Updated this week
- ☆48Nov 1, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Yuan3.0: Mixture-of-Experts (MoE) Language Model☆187Apr 7, 2026Updated 2 months ago
- mKernel: fast multi-node, multi-GPU fused kernels☆237Jun 8, 2026Updated last week
- 👷 Build compute kernels☆213Apr 6, 2026Updated 2 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels☆171Apr 26, 2026Updated last month
- ☆132Jun 6, 2026Updated last week
- infinite coding agent☆86Jun 11, 2026Updated last week
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆40Sep 24, 2024Updated last year
- This is the official implementation for the paper "Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-…☆36Mar 30, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆136Nov 26, 2025Updated 6 months ago
- ☆176Apr 23, 2025Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,060Mar 24, 2026Updated 2 months ago
- KV Cache & LoRA for minGPT☆61Mar 4, 2026Updated 3 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆123Mar 6, 2024Updated 2 years ago
- Making code edting up to 7.7x faster using multi-layer speculation☆24Feb 20, 2025Updated last year
- Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…☆100May 18, 2026Updated last month
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 6 months ago
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- RAPIDS Deployment Documentation☆15Jun 10, 2026Updated last week
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆175Oct 25, 2025Updated 7 months ago
- Fast low-bit matmul kernels in Triton☆471May 15, 2026Updated last month
- A Transformer Model Exploiting Histology Images and Spatial Gene Expression☆22Mar 18, 2025Updated last year
- A hackable library for running and fine-tuning modern transformer models on commodity and alternative GPUs, powered by tinygrad.☆30Feb 10, 2026Updated 4 months ago
- ☆20Updated this week
- ☆46May 3, 2026Updated last month
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,305Updated this week
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Repo of CudaForge☆84Dec 2, 2025Updated 6 months ago
- A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…☆134May 22, 2026Updated 3 weeks ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated 5 months ago
- High-Performance FP32 GEMM on CUDA devices☆125Jan 21, 2025Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆441Mar 30, 2026Updated 2 months ago
- ☆41Dec 15, 2022Updated 3 years ago
- Paging Debug tool for GDB using python☆13Jun 4, 2022Updated 4 years ago