QLM-project/QLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QLM-project/QLM)

QLM-project / QLM

☆32

Alternatives and similar repositories for QLM

Users that are interested in QLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hao-ai-lab / MuxServe
View on GitHub
☆90Oct 17, 2025Updated 9 months ago
eddiegaoo / Apt-Serve
View on GitHub
☆21Jun 9, 2025Updated last year
UMass-LIDS / Proteus
View on GitHub
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Mar 7, 2024Updated 2 years ago
junhongmit / P-and-B
View on GitHub
🧠Plan-and-Budget: Training-free test-time reasoning framework for adaptive token allocation in large language models (ICLR 2026).
☆15Mar 2, 2026Updated 4 months ago
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pku-lemonade / TokenSim
View on GitHub
TokenSim is a tool for simulating the behavior of large language models (LLMs) in a distributed environment.
☆27Jun 26, 2026Updated 3 weeks ago
KevinLee1110 / dynamic-batching
View on GitHub
The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"
☆18Mar 17, 2025Updated last year
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
uclasystem / VQPy
View on GitHub
A language for video analytics
☆12Jan 26, 2023Updated 3 years ago
pie-project / pie
View on GitHub
Pie: Programmable LLM Serving
☆184Updated this week
mutinifni / splitwise-sim
View on GitHub
LLM serving cluster simulator
☆157Apr 25, 2024Updated 2 years ago
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
zhengzangw / Sequence-Scheduling
View on GitHub
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93May 23, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
James-QiuHaoran / LLM-serving-with-proxy-models
View on GitHub
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆52Jun 1, 2024Updated 2 years ago
Azure / AzurePublicDataset
View on GitHub
Microsoft Azure Traces
☆1,159Jun 3, 2026Updated last month
MachineLearningSystem / 26FAST-PipeANN
View on GitHub
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
☆15Jul 1, 2025Updated last year
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆169Updated this week
opendatahub-io / caikit-tgis-serving
View on GitHub
☆21Updated this week
tyler-griggs / melange-release
View on GitHub
☆48Jun 27, 2024Updated 2 years ago
gty111 / gLLM
View on GitHub
An Efficient and Versatile Inference Engine for Distributed LLM Serving
☆66Updated this week
astra-sim / astra-network-ns3
View on GitHub
☆14Mar 15, 2026Updated 4 months ago
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lorenzogentile404 / feldman-verifiable-secret-sharing
View on GitHub
☆10Apr 29, 2020Updated 6 years ago
eth-easl / orion
View on GitHub
An interference-aware scheduler for fine-grained GPU sharing
☆164Nov 26, 2025Updated 7 months ago
ovg-project / GVM
View on GitHub
☆23Jan 18, 2026Updated 6 months ago
g4197 / FreshDiskANN-baseline
View on GitHub
A FreshDiskANN baseline used by OdinANN [FAST '26] evaluation.
☆16Jun 16, 2025Updated last year
NetSys / resq
View on GitHub
☆15Aug 6, 2018Updated 7 years ago
universome / non-uniform-interpolation
View on GitHub
Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257
☆11Oct 3, 2021Updated 4 years ago
SpaceNetLab / SKYFALL
View on GitHub
SKYFALL: dynamically identifies and exploits bottleneck links with a geo-distributed botnet to flood them.
☆13Oct 23, 2024Updated last year
ranggihwang / Pregated_MoE
View on GitHub
☆62May 4, 2024Updated 2 years ago
mininet / mininet-util
View on GitHub
Mininet monitoring and plotting utilities
☆20Aug 17, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pfandzelter / LLEOSCN-CDN-Sim
View on GitHub
Simulation tool for CDN replication in large low-earth orbit satellite access networks.
☆12May 17, 2021Updated 5 years ago
SalesforceAIResearch / xRouter
View on GitHub
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
☆31Jun 2, 2026Updated last month
noahdasilva / satellite-network-sim
View on GitHub
A Python program that simulates a satellite network using pygame, allowing users to create, configure, and visualize the network state ov…
☆11Apr 25, 2023Updated 3 years ago
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
ajtejankar / mixtral-vis-moe
View on GitHub
Visualize expert firing frequencies across sentences in the Mixtral MoE model
☆18Dec 22, 2023Updated 2 years ago
Starlink-Project / Satellite-vs-Cellular
View on GitHub
LEO Satellite vs. Cellular Networks: Exploring the Potential for Synergistic Integration (CoNEXT '23)
☆11Oct 26, 2023Updated 2 years ago
freecores / divider
View on GitHub
Hardware Division Units
☆10Jul 17, 2014Updated 12 years ago