wzzll123 / MultiKernelBenchLinks
MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation
☆42Updated 2 weeks ago
Alternatives and similar repositories for MultiKernelBench
Users that are interested in MultiKernelBench are comparing it to the libraries listed below
Sorting:
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆313Updated 7 months ago
- ☆155Updated 11 months ago
- High performance Transformer implementation in C++.☆150Updated last year
- ☆131Updated last year
- Allow torch tensor memory to be released and resumed later☆216Updated 3 weeks ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆676Updated this week
- ☆47Updated last year
- ☆342Updated last week
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆283Updated 11 months ago
- nnScaler: Compiling DNN models for Parallel Training☆124Updated 4 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆151Updated this week
- A lightweight design for computation-communication overlap.☆219Updated 2 weeks ago
- Summary of some awesome work for optimizing LLM inference☆172Updated 2 months ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆66Updated last month
- Learning TileLang with 10 puzzles!☆118Updated last week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆457Updated 8 months ago
- Tile-Based Runtime for Ultra-Low-Latency LLM Inference☆564Updated last week
- LLM training technologies developed by kwai☆70Updated 2 weeks ago
- High Performance LLM Inference Operator Library☆695Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆370Updated 6 months ago
- ☆164Updated 6 months ago
- Nex Venus Communication Library☆72Updated 2 months ago
- paper and its code for AI System☆347Updated last month
- ☆15Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated last month
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆83Updated 4 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆279Updated this week
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆87Updated 2 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆771Updated 10 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆161Updated 4 months ago