zejia-lin/BulletServe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zejia-lin/BulletServe)

zejia-lin / BulletServe

Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration

☆53

Alternatives and similar repositories for BulletServe

Users that are interested in BulletServe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Nelson-Cheung / yatsenos-riscv
View on GitHub
Rebuild YatSenOS On RISC-V 64.
☆23Jan 6, 2022Updated 4 years ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆52Dec 19, 2025Updated 7 months ago
wu-kan / GoPTX
View on GitHub
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
☆21Jul 30, 2025Updated 11 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MoZeWei / moTuner
View on GitHub
☆10May 12, 2022Updated 4 years ago
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
GZTimeWalker / YYDB
View on GitHub
Yat another MySQL storage engine, a database course project.
☆13Dec 23, 2022Updated 3 years ago
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 8 months ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
OSU-STARLAB / UVM_benchmark
View on GitHub
☆34Sep 9, 2020Updated 5 years ago
SiriusInfTra / Sirius
View on GitHub
☆18Sep 21, 2025Updated 10 months ago
yuanxinnn / APTMoE
View on GitHub
☆13Jun 29, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gty111 / gLLM
View on GitHub
An Efficient and Versatile Inference Engine for Distributed LLM Serving
☆66Updated this week
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
LLMServe / FastServe
View on GitHub
☆29Sep 26, 2025Updated 10 months ago
ScaleX-IO / uGDS
View on GitHub
A user-space GPU Direct Storage library
☆184Jul 17, 2026Updated last week
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆41May 26, 2026Updated 2 months ago
pku-liang / MAGIS
View on GitHub
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆57May 29, 2024Updated 2 years ago
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
SYSU-SCC / sysu-scc-spack-repo
View on GitHub
Spack package repository maintained by Student Cluster Competition Team @ Sun Yat-sen University.
☆16Aug 20, 2025Updated 11 months ago
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
thustorage / GCR
View on GitHub
code repo for GCR [FAST'26]
☆16Mar 3, 2026Updated 4 months ago
vortexgpgpu / Volt
View on GitHub
☆18Feb 9, 2026Updated 5 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
View on GitHub
☆155Apr 23, 2026Updated 3 months ago
NEO-MLSys25 / NEO
View on GitHub
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆100Jun 16, 2025Updated last year
howardlau1999 / yatcpu
View on GitHub
Yet another toy CPU.
☆92Dec 10, 2023Updated 2 years ago
SYSU-SCC / yatcpu-docs
View on GitHub
Documentation for YatCPU
☆55Nov 15, 2023Updated 2 years ago
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
HNUSystemsLab / ZBTree
View on GitHub
ZBTree A Hotness-Aware B+-Tree for Persistent Memory
☆17May 4, 2024Updated 2 years ago
aoli-al / HFuse
View on GitHub
Horizontal Fusion
☆24Jan 7, 2022Updated 4 years ago
owensgroup / ATOS
View on GitHub
Multi-GPU dynamic scheduler using PGAS style cross-GPU communication
☆29Jul 23, 2023Updated 3 years ago
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆243Jan 20, 2026Updated 6 months ago
infinigence / Semi-PD
View on GitHub
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆127Dec 25, 2025Updated 7 months ago
microsoft / chunk-attention
View on GitHub
☆89Apr 18, 2025Updated last year
MLSysU / EcoServe
View on GitHub
[OSDI' 26] Efficient LLM Serving on Commodity GPU Clusters with Data-Reduced Cross-Instance Orchestration
☆23Jul 5, 2026Updated 3 weeks ago