Fast and Furious AMD Kernels
☆415May 13, 2026Updated last week
Alternatives and similar repositories for HipKittens
Users that are interested in HipKittens are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Jun 10, 2022Updated 3 years ago
- ☆24Apr 7, 2026Updated last month
- This repo contains data and code for the paper "Reasoning over Public and Private Data in Retrieval-Based Systems."☆46Jul 18, 2024Updated last year
- AI Tensor Engine for ROCm☆430Updated this week
- Super fast FP32 matrix multiplication on RDNA3☆89Mar 30, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆24Updated this week
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- amdgpu example code in hip/asm☆61Apr 22, 2026Updated 3 weeks ago
- ☆52May 19, 2025Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆441Mar 30, 2026Updated last month
- PeRL: Parameter-Efficient Reinforcement Learning☆79May 3, 2026Updated 2 weeks ago
- Tile primitives for speedy kernels☆3,360May 11, 2026Updated last week
- ☆57Feb 24, 2026Updated 2 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144May 12, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆52Jul 4, 2025Updated 10 months ago
- AiTer Optimized Model☆88Updated this week
- Ahead of Time (AOT) Triton Math Library☆98Updated this week
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 3 months ago
- Repository used for my master's thesis on implementing RVSDG as a dialect of MLIR☆13May 30, 2023Updated 2 years ago
- ☆60Jul 9, 2024Updated last year
- ☆66Apr 26, 2025Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Check to see if an SDist matches Git☆12May 4, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ship correct and fast LLM kernels to PyTorch☆150Jan 14, 2026Updated 4 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆166May 4, 2026Updated 2 weeks ago
- Perplexity GPU Kernels☆576Nov 7, 2025Updated 6 months ago
- A practical way of learning Swizzle☆39Feb 3, 2025Updated last year
- [ICML 2026] Reasoning in Parallelism via Self-Distilled RL☆110Feb 5, 2026Updated 3 months ago
- ☆266Jul 11, 2024Updated last year
- ☆119May 19, 2025Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆134Apr 10, 2026Updated last month
- Open ABI and FFI for Machine Learning Systems☆395May 11, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆261May 11, 2026Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆189May 12, 2026Updated last week
- ☆13May 11, 2026Updated last week
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆106Updated this week
- Cuda extensions for PyTorch☆12Dec 2, 2025Updated 5 months ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,259May 11, 2026Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,007Mar 24, 2026Updated last month