TokenSpeed is a speed-of-light LLM inference engine.
☆1,004May 14, 2026Updated last week
Alternatives and similar repositories for tokenspeed
Users that are interested in tokenspeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Intel® SHMEM - Device initiated shared memory based communication library☆32Nov 12, 2025Updated 6 months ago
- https://bbuf.github.io/gpu-glossary-zh/☆27Nov 7, 2025Updated 6 months ago
- ☆37Aug 7, 2025Updated 9 months ago
- ☆80Apr 29, 2026Updated 3 weeks ago
- FlashKDA: high-performance Kimi Delta Attention kernels☆424Apr 22, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Orchestration and memory for multi-agent systems☆14Feb 6, 2026Updated 3 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆532May 5, 2026Updated 2 weeks ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆189May 12, 2026Updated last week
- Learning High-Quality and General-Purpose Phrase Representations. Findings of EACL 2024☆16Feb 29, 2024Updated 2 years ago
- KV cache store for distributed LLM inference☆419Nov 13, 2025Updated 6 months ago
- Distributed ML Optimizer☆35Jul 28, 2021Updated 4 years ago
- AI memory system combining vector search with temporal knowledge graph. Built-in cognitive engine for agents. Supports memory decay, cont…☆72Updated this week
- My study note for mlsys☆14Nov 4, 2024Updated last year
- Python library to add support for embedding natural code in Python with shared program state.☆30Jan 20, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆103Apr 7, 2026Updated last month
- An asynchronous streaming data management module for efficient post-training.☆72Updated this week
- JSON Logging for Sanic☆10Sep 1, 2021Updated 4 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆297May 1, 2025Updated last year
- Memory Topology for GPUs☆19May 11, 2026Updated last week
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆126Dec 25, 2025Updated 4 months ago
- ☆351Apr 16, 2026Updated last month
- ☆17Nov 20, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]☆50Apr 29, 2026Updated 3 weeks ago
- Examples for KubeEdge☆13Sep 29, 2020Updated 5 years ago
- [KDD 2025] The source code for UQABench☆13Aug 18, 2025Updated 9 months ago
- ☆12Jan 17, 2024Updated 2 years ago
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆173Oct 25, 2025Updated 6 months ago
- ☆341May 8, 2026Updated last week
- Perplexity GPU Kernels☆576Nov 7, 2025Updated 6 months ago
- Spatial Transformer Network (STN) provides attention to a particular region to in an image, by doing transformation to the input image. T…☆15Dec 21, 2020Updated 5 years ago
- ☆166Dec 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- incubator repo for CUDA-TileIR backend☆135Apr 22, 2026Updated 3 weeks ago
- 记录一些 GPT 相关的有趣的小点子☆15Mar 20, 2023Updated 3 years ago
- ☆16Jul 12, 2024Updated last year
- We will be open sourcing a tool called FARSI (Facebook AR system investigator), a design space exploration framework. FARSI enables an ag…☆32Oct 30, 2022Updated 3 years ago
- PnLClaw — local-first crypto and prediction market quant engine.☆173Apr 8, 2026Updated last month
- ☆52May 19, 2025Updated last year
- A Quirky Assortment of CuTe Kernels☆972Updated this week