ScalingIntelligence/hydragen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ScalingIntelligence/hydragen)

ScalingIntelligence / hydragen

Hydragen: High-Throughput LLM Inference with Shared Prefixes

☆56

Alternatives and similar repositories for hydragen

Users that are interested in hydragen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Infini-AI-Lab / Sparrow
View on GitHub
☆16Jun 15, 2026Updated last month
jordan-benjamin / pydra
View on GitHub
Simple, flexible configuration in pure Python!
☆32Jul 1, 2025Updated last year
madsys-dev / deepseekv2-profile
View on GitHub
☆156Mar 4, 2025Updated last year
ScalingIntelligence / Archon
View on GitHub
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆207Mar 7, 2025Updated last year
PanZaifeng / FastTree-Artifact
View on GitHub
☆32Mar 24, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ScalingIntelligence / large_language_monkeys
View on GitHub
☆117Sep 25, 2024Updated last year
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
zhang677 / PCL-lite
View on GitHub
[ICML 2025] Adaptive Self-improvement LLM Agentic System for ML Library Development
☆17Jan 6, 2026Updated 6 months ago
ScalingIntelligence / codemonkeys
View on GitHub
☆59Jan 28, 2025Updated last year
layer6ai-labs / UoMH
View on GitHub
☆16Aug 7, 2023Updated 2 years ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
sail-sg / VocabularyParallelism
View on GitHub
Vocabulary Parallelism
☆26Mar 10, 2025Updated last year
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆176Jun 16, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Olivia-fsm / DoGE
View on GitHub
Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"
☆21Feb 29, 2024Updated 2 years ago
sgl-project / sgl-kernel-xpu
View on GitHub
SGLang kernel library for Intel XPU
☆27Updated this week
rsusik / rambenchmark
View on GitHub
Simple RAM benchmark for Linux.
☆12Aug 4, 2021Updated 4 years ago
opendatahub-io / vllm-tgis-adapter
View on GitHub
vLLM adapter for a TGIS-compatible gRPC server.
☆55Updated this week
interestingLSY / swiftLLM
View on GitHub
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆329Jun 10, 2025Updated last year
xAlg-ai / HashAttention-1.0
View on GitHub
☆18Sep 23, 2025Updated 10 months ago
ScalingIntelligence / tokasaurus
View on GitHub
☆483Nov 25, 2025Updated 8 months ago
shengshu-ai / TurboServe
View on GitHub
TurboServe: Serving Streaming Video Generation Efficiently and Economically
☆37Jul 12, 2026Updated last week
marius-team / quake
View on GitHub
Query-Adaptive Vector Search
☆77Mar 19, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
casper-hansen / AutoAWQ_kernels
View on GitHub
☆80Nov 26, 2024Updated last year
togethercomputer / ParallelKernelBench
View on GitHub
☆44Jul 1, 2026Updated 3 weeks ago
linxihui / dkernel
View on GitHub
☆22Apr 17, 2025Updated last year
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆181Updated this week
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆344Jul 2, 2024Updated 2 years ago
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
samchaineau / llm_slerp_generation
View on GitHub
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Oct 9, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
cchan / tccl
View on GitHub
extensible collectives library in triton
☆97Mar 31, 2025Updated last year
mrhrifat / al-quran
View on GitHub
Al Quran is the holy book of Islam. Muslims believe that the Quran was revealed by Allah (SWT) to the final prophet & messenger, Muhammad…
☆12Apr 30, 2023Updated 3 years ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆41May 26, 2026Updated last month
EECS150 / fpga_labs_fa21
View on GitHub
FPGA Labs for EECS 151/251A (Fall 2021)
☆13Oct 20, 2021Updated 4 years ago
redai-infra / PIPO
View on GitHub
Implementation of an efficient LLM architecture: the Pair-In / Pair-Out Model (PIPO)
☆42Jun 10, 2026Updated last month
itsdaniele / speculative_mamba
View on GitHub
☆18Nov 28, 2024Updated last year