☆88May 24, 2026Updated this week
Alternatives and similar repositories for KernelWiki
Users that are interested in KernelWiki are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆48Jan 8, 2026Updated 4 months ago
- ☆15Jan 21, 2021Updated 5 years ago
- EMNLP-2021 paper: Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems.☆16Nov 11, 2021Updated 4 years ago
- ☆26Jul 15, 2024Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Next-Generation AI-Assisted Kernel Engineering for Multi-Chip Systems☆49May 9, 2026Updated 2 weeks ago
- ☆99Mar 31, 2026Updated last month
- Stable diffusion dedicated Hardware with multiple pipelined processor cores☆14Apr 9, 2026Updated last month
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆59Updated this week
- A rendering engine based on Vulkan, designed to implement various graphics algorithms.☆29Sep 28, 2023Updated 2 years ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 9 months ago
- ☆58Mar 31, 2026Updated last month
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 3 years ago
- Optimize GEMM with tensorcore step by step☆37Dec 17, 2023Updated 2 years ago
- Persistent Kernel + JIT-Injected Operators (CUDA)☆47Jan 27, 2026Updated 3 months ago
- ☆14Jul 13, 2025Updated 10 months ago
- pytorch版基于gpt+nezha的中文多轮Cdial☆11Oct 22, 2022Updated 3 years ago
- ☆18May 6, 2026Updated 2 weeks ago
- The vLLM XPU kernels for Intel GPU☆44May 19, 2026Updated last week
- 数据库内核笔记☆14Aug 18, 2022Updated 3 years ago
- 面向多平 台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Collection of scripts to build PyTorch and the domain libraries from source.☆14Apr 1, 2026Updated last month
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆137Nov 26, 2025Updated 6 months ago
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 9 months ago
- (elastic) cuckoo hashing☆17Jun 20, 2020Updated 5 years ago
- Fast and memory-efficient exact attention☆21Apr 10, 2026Updated last month
- ☆10Dec 8, 2021Updated 4 years ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆15Jan 28, 2025Updated last year
- A plugin to make view transformer from perspective view to bird-eye-view, it is used in bevdet☆24Feb 24, 2023Updated 3 years ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆71Dec 11, 2025Updated 5 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆14Dec 21, 2025Updated 5 months ago
- Kaggleのshopeeコンペのリポジトリ☆11Jun 7, 2021Updated 4 years ago
- ☆20Nov 14, 2023Updated 2 years ago
- Local LLM Inference Speed Test Tool☆71May 15, 2026Updated last week
- ☆11May 2, 2023Updated 3 years ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆12Jun 10, 2024Updated last year
- A CLI tool for managing your locally downloaded Huggingface models and datasets☆35Aug 19, 2025Updated 9 months ago