CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
☆204Jun 10, 2026Updated this week
Alternatives and similar repositories for coda-kernels
Users that are interested in coda-kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open-source toolkit for training, Priming, and serving next generation Hybrid architectures☆71May 9, 2026Updated last month
- ☆248Nov 19, 2025Updated 6 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆27Apr 8, 2026Updated 2 months ago
- Flash-Linear-Attention models beyond language☆21Aug 28, 2025Updated 9 months ago
- MICRO 2023 Evaluation Artifact for TeAAL☆11Oct 26, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 9 months ago
- ☆50May 20, 2025Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆134Jun 24, 2025Updated 11 months ago
- This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".☆22Nov 17, 2025Updated 6 months ago
- Large DNNs training framework for consumer GPUs☆86Jun 1, 2026Updated last week
- ☆45Nov 1, 2025Updated 7 months ago
- Accelerator Zoo☆20Oct 14, 2025Updated 7 months ago
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆90Jan 12, 2026Updated 4 months ago
- ☆520Jun 2, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paper☆18Nov 6, 2025Updated 7 months ago
- [ICLR 2026] FastCar☆16May 22, 2025Updated last year
- Source code for XPGraph-MICRO22☆11Apr 10, 2023Updated 3 years ago
- A repository of ELL models☆21Jan 16, 2026Updated 4 months ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- ☆52May 19, 2025Updated last year
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆24Jun 3, 2026Updated last week
- performance counters in C++☆28Apr 24, 2026Updated last month
- MICRO 2024 Evaluation Artifact for FuseMax☆17Aug 26, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Learnable Semi-structured Sparsity for Vision Transformers and Diffusion Transformers☆15Feb 7, 2025Updated last year
- ☆541Updated this week
- This GitHub repo contains the artifact for CPElide, which appears at MICRO '24☆16Sep 7, 2024Updated last year
- Official Implementation of APB (ACL 2025 main Oral) and Spava (ACL 2026 main).☆37Apr 6, 2026Updated 2 months ago
- FlashKDA: high-performance Kimi Delta Attention kernels☆447May 26, 2026Updated 2 weeks ago
- ☆19Jan 2, 2026Updated 5 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated 2 years ago
- Apache DataFusion Benchmarks☆23May 2, 2026Updated last month
- cuASR: CUDA Algebra for Semirings