tonyzhao-jt/LLM-PQ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tonyzhao-jt/LLM-PQ)

tonyzhao-jt / LLM-PQ

Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization"

☆39

Alternatives and similar repositories for LLM-PQ

Users that are interested in LLM-PQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
PeterSH6 / MSPipe
View on GitHub
☆16Feb 20, 2024Updated 2 years ago
jasperzhong / GNNFlow
View on GitHub
Distributed Deep Graph Learning Framework for Dynamic Graphs
☆19Mar 25, 2024Updated 2 years ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
View on GitHub
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆19Dec 8, 2023Updated 2 years ago
tyler-griggs / melange-release
View on GitHub
☆48Jun 27, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
caoting-dotcom / multiBranchModel
View on GitHub
Multi-branch model for concurrent execution
☆18Jun 27, 2023Updated 3 years ago
wassemgtk / MegaScale-Infer-Prototyp
View on GitHub
Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
☆32Apr 4, 2025Updated last year
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
raywan-110 / AdaQP
View on GitHub
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
☆24Mar 1, 2024Updated 2 years ago
ByteDance-Seed / ByteCheckpoint
View on GitHub
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆287Feb 2, 2026Updated 5 months ago
IBM / LLM-performance-prediction
View on GitHub
Predict the performance of LLM inference services
☆23Sep 18, 2025Updated 10 months ago
MassimoPerini / online-gnn-learning
View on GitHub
☆13Dec 16, 2021Updated 4 years ago
usc-isi / PipeEdge
View on GitHub
PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
☆41Jan 31, 2024Updated 2 years ago
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆37May 6, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
llm-db / FineInfer
View on GitHub
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
☆19May 28, 2024Updated 2 years ago
lwy2020 / MicroMix
View on GitHub
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
☆28Apr 2, 2026Updated 3 months ago
UChi-JCL / CacheGen
View on GitHub
☆168Oct 9, 2024Updated last year
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
ldbc / data-sets-surf-repository
View on GitHub
☆16Feb 7, 2026Updated 5 months ago
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated 2 years ago
jasperzhong / swift
View on GitHub
☆15Apr 20, 2022Updated 4 years ago
cornell-zhang / llm-datatypes
View on GitHub
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆27Jun 25, 2024Updated 2 years ago
Funatiq / gossip
View on GitHub
gossip: Efficient Communication Primitives for Multi-GPU Systems
☆62Jul 1, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Thesys-lab / Helix-ASPLOS25
View on GitHub
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆93Oct 15, 2025Updated 9 months ago
SymbioticLab / Oobleck
View on GitHub
A resilient distributed training framework
☆100Updated this week
thustorage / deft
View on GitHub
Deft: A Scalable Tree Index for Disaggregated Memory
☆22Apr 23, 2025Updated last year
joapolarbear / dpro
View on GitHub
Analysis for the traces from byteprofile
☆32Nov 21, 2023Updated 2 years ago
AutonomicPerfectionist / PipeInfer
View on GitHub
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
☆32Nov 16, 2024Updated last year
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
DS3Lab / Decentralized_FM_alpha
View on GitHub
☆18May 4, 2023Updated 3 years ago
facebookresearch / taser-tgnn
View on GitHub
[IPDPS 2024] Adaptive neighbor sampling for temporal GNN
☆16Feb 17, 2025Updated last year
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆184Jul 12, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
msr-fiddle / CheckFreq
View on GitHub
☆57Jan 25, 2021Updated 5 years ago
HKU-MedAI / LIG
View on GitHub
[AAAI'2025] Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
☆37Jan 4, 2026Updated 6 months ago
HKBU-HPML / ddl-benchmarks
View on GitHub
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆36May 29, 2020Updated 6 years ago
netx-repo / PipeSwitch
View on GitHub
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆127May 9, 2022Updated 4 years ago
hao-ai-lab / MuxServe
View on GitHub
☆91Oct 17, 2025Updated 9 months ago
bigrl-team / gear
View on GitHub
A distributed GPU-centric experience replay system for large AI models.
☆19Aug 1, 2023Updated 2 years ago
leodestiny / BGL_NSDI2023
View on GitHub
Open source code of BGL NSDI 2023
☆17Apr 27, 2026Updated 3 months ago