☆238Jun 9, 2026Updated this week
Alternatives and similar repositories for KernelWiki
Users that are interested in KernelWiki are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…☆133May 22, 2026Updated 3 weeks ago
- (elastic) cuckoo hashing☆17Jun 20, 2020Updated 5 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Effective transpose on Hopper GPU☆29Sep 6, 2025Updated 9 months ago
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆51Jan 8, 2026Updated 5 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆122May 16, 2025Updated last year
- ☆11Apr 5, 2021Updated 5 years ago
- HierCGRA: An Open-Source Framework for Large-Scale CGRA with Hierarchical Modeling and Automated Exploration☆14Mar 6, 2023Updated 3 years ago
- ☆14Feb 23, 2025Updated last year
- CS169.1x Software as a Service course offered by UC Berkeley at edx.org☆14Oct 28, 2014Updated 11 years ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 4 months ago
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated last year
- Expert Specialization MoE Solution based on CUTLASS☆27Apr 14, 2026Updated 2 months ago
- Codes for MO's Trading☆16Mar 20, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆109Dec 2, 2025Updated 6 months ago
- Cute layout visualization☆40Jan 18, 2026Updated 4 months ago
- Official repository Flash Local Linear Attention☆36May 28, 2026Updated 2 weeks ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆32Mar 25, 2026Updated 2 months ago
- ☆11Feb 13, 2025Updated last year
- PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions☆19Dec 18, 2023Updated 2 years ago
- A PyTorch-Based GPU Parallel Env for IPPS Problem, supporting DRL, IL and Learning Guided MCTS.☆17May 25, 2026Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A cross-platform RISC-V interpreter that implements the RV32IMA instruction set.☆24Aug 23, 2022Updated 3 years ago
- Implement Flash Attention using Cute.☆108Dec 17, 2024Updated last year
- PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.☆207Dec 24, 2025Updated 5 months ago
- An Efficient and Versatile Inference Engine for Distributed LLM Serving☆60Jun 8, 2026Updated last week
- self hosted responsive photo/album manager & server writen in nodejs, koa2, react, redux☆11May 25, 2017Updated 9 years ago
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆30Jan 4, 2026Updated 5 months ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 4 years ago
- Optimize GEMM with tensorcore step by step☆37Dec 17, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Repo for PyChart 1.39, refs http://download.gna.org/pychart/☆10Sep 29, 2014Updated 11 years ago
- ☆14Jul 13, 2025Updated 11 months ago
- pytorch版基于gpt+nezha的中文多轮Cdial☆11Oct 22, 2022Updated 3 years ago
- The vLLM XPU kernels for Intel GPU☆47Updated this week
- 数据库内核笔记☆14Aug 18, 2022Updated 3 years ago
- 面向多平台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- Collection of scripts to build PyTorch and the domain libraries from source.☆14Updated this week