Hydragen: High-Throughput LLM Inference with Shared Prefixes
☆55May 10, 2024Updated 2 years ago
Alternatives and similar repositories for hydragen
Users that are interested in hydragen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆157Mar 4, 2025Updated last year
- ☆116Sep 25, 2024Updated last year
- Query-Adaptive Vector Search☆76Mar 19, 2026Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆205Mar 7, 2025Updated last year
- Vocabulary Parallelism☆26Mar 10, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Simple RAM benchmark for Linux.☆12Aug 4, 2021Updated 4 years ago
- Benchmarking Intelligence Efficiency of LM Inference☆64Jun 18, 2026Updated 2 weeks ago
- Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"☆21Feb 29, 2024Updated 2 years ago
- FPGA Labs for EECS 151/251A (Fall 2021)☆12Oct 20, 2021Updated 4 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- ☆17Nov 28, 2024Updated last year
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- SGEMM optimization with cuda step by step☆23Mar 23, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Stateful LLM Serving☆103Mar 11, 2025Updated last year
- ☆80Nov 26, 2024Updated last year
- Test equality between a black-box LLM API and a reference distribution☆18Oct 29, 2024Updated last year
- A dock created with React, TypeScript, Electron and Flask to support SWE's with ADHD☆20Mar 24, 2026Updated 3 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆342Jul 2, 2024Updated 2 years ago
- [ICML 2025] Adaptive Self-improvement LLM Agentic System for ML Library Development☆17Jan 6, 2026Updated 5 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆329Jun 10, 2025Updated last year
- ☆15Apr 15, 2026Updated 2 months ago
- ACM SoCC 2019, "Coupling Decentralized Key-Value Stores with Erasure Coding"☆15May 22, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆34Jun 22, 2024Updated 2 years ago
- ☆65Apr 26, 2025Updated last year
- LLM Inference analyzer for different hardware platforms☆115Jun 23, 2026Updated last week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆194Feb 11, 2026Updated 4 months ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- Index of Knowledge☆16Jan 6, 2023Updated 3 years ago
- ☆17Sep 15, 2021Updated 4 years ago
- ☆123May 19, 2025Updated last year
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆258Oct 24, 2025Updated 8 months ago
- Hex encode & decode a string, right from your terminal.☆10Jan 5, 2023Updated 3 years ago
- A GPU FP32 computation method with Tensor Cores.☆27Dec 8, 2025Updated 6 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Sep 10, 2024Updated last year
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆489Jun 11, 2026Updated 3 weeks ago
- MIT6.S081实验记录,并且利用Docker+code-server(网页版Vscode)进行环境搭建,实现开箱即用的纯净实验环境,具体使用说明请看下面的网站☆12Jan 28, 2024Updated 2 years ago
- ☆20Dec 24, 2024Updated last year