Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
☆1,370Mar 19, 2026Updated 2 months ago
Alternatives and similar repositories for autokernel
Users that are interested in autokernel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …☆168Apr 22, 2026Updated last month
- Triton kernels for Flux☆23Jul 7, 2025Updated 10 months ago
- ☆46Nov 1, 2025Updated 6 months ago
- 👷 Build compute kernels☆213Apr 6, 2026Updated last month
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated last year
- FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels☆167Apr 26, 2026Updated last month
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆39Sep 24, 2024Updated last year
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆137Nov 26, 2025Updated 6 months ago
- A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.☆1,178Updated this week
- ☆176Apr 23, 2025Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,020Mar 24, 2026Updated 2 months ago
- KV Cache & LoRA for minGPT☆63Mar 4, 2026Updated 2 months ago
- Making code edting up to 7.7x faster using multi-layer speculation☆23Feb 20, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 6 months ago
- Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…☆98May 18, 2026Updated last week
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 8 months ago
- RAPIDS Deployment Documentation☆15May 18, 2026Updated last week
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆173Oct 25, 2025Updated 7 months ago
- Fast low-bit matmul kernels in Triton☆458May 15, 2026Updated last week
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- A hackable library for running and fine-tuning modern transformer models on commodity and alternative GPUs, powered by tinygrad.☆29Feb 10, 2026Updated 3 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆52Mar 3, 2026Updated 2 months ago
- ☆45May 3, 2026Updated 3 weeks ago
- ☆48Apr 16, 2026Updated last month
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,271May 20, 2026Updated last week
- Official Repo of CudaForge☆82Dec 2, 2025Updated 5 months ago
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 5 months ago
- A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…☆126May 8, 2026Updated 2 weeks ago
- Gym-Anything: Turn any Software into an Agent Environment☆234May 18, 2026Updated last week
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆442Mar 30, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- High-Performance FP32 GEMM on CUDA devices☆125Jan 21, 2025Updated last year
- ☆42Dec 15, 2022Updated 3 years ago
- Shor's algorithm simulation using CUDA☆19Nov 10, 2019Updated 6 years ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆60Oct 18, 2025Updated 7 months ago
- real-time speech enhance skip-dpcrn-base using C++☆25Nov 12, 2022Updated 3 years ago
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆36Apr 29, 2026Updated 3 weeks ago
- My study notes and hands-on projects for CUDA-based GPU programming☆12Dec 11, 2025Updated 5 months ago