lightseekorg/tokenspeed

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lightseekorg/tokenspeed)

lightseekorg / tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

☆1,589

Alternatives and similar repositories for tokenspeed

Users that are interested in tokenspeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lightseekorg / smg
View on GitHub
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini &…
☆391Updated this week
lightseekorg / TorchSpec
View on GitHub
A PyTorch native library for training speculative decoding models
☆200Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,957Updated this week
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆450May 26, 2026Updated last month
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
BBuf / AI-Infra-Auto-Driven-SKILLS
View on GitHub
☆670Updated this week
deepseek-ai / TileKernels
View on GitHub
A kernel library written in tilelang
☆1,642Apr 23, 2026Updated 2 months ago
sgl-project / sglang-omni
View on GitHub
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
☆627Updated this week
mit-han-lab / kernel-design-agents
View on GitHub
☆743Jun 2, 2026Updated last month
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆590Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,821Updated this week
openinfer-project / openinfer
View on GitHub
Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2
☆536Updated this week
z-lab / dflash
View on GitHub
DFlash: Block Diffusion for Flash Speculative Decoding
☆5,468May 10, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,488Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,133Updated this week
deepseek-ai / DeepSpec
View on GitHub
DeepSpec: a full-stack codebase for training and evaluating speculative decoding algorithms
☆6,638Updated this week
open-lm-engine / coda-kernels
View on GitHub
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
☆227Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,306Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,483Updated this week
mit-han-lab / KernelWiki
View on GitHub
☆303Jun 9, 2026Updated last month
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,636Updated this week
vllm-project / vime
View on GitHub
An LLM post-training framework with vLLM for RL Scaling
☆363Updated this week
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆588Nov 7, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lucifer1004 / VeloQ
View on GitHub
Agent-friendly GPU profile-query CLI
☆105Jun 22, 2026Updated 3 weeks ago
BBuf / KDA-Pilot
View on GitHub
☆224Updated this week
novitalabs / pegaflow
View on GitHub
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…
☆172Updated this week
tile-ai / TileFoundry
View on GitHub
☆44Updated this week
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,546Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,341Aug 28, 2025Updated 10 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆730Jul 4, 2026Updated last week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,076Updated this week
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆253Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
LMCache / LMCache
View on GitHub
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
☆10,547Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,710Updated this week
RightNow-AI / autokernel
View on GitHub
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
☆1,458Mar 19, 2026Updated 3 months ago
mit-han-lab / ncu-report-skill
View on GitHub
☆155May 24, 2026Updated last month
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆130May 20, 2026Updated last month
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆967Mar 29, 2026Updated 3 months ago
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated last week