tonyzhao-jt / LLM-PQView external linksLinks
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization"
☆36Aug 29, 2025Updated 5 months ago
Alternatives and similar repositories for LLM-PQ
Users that are interested in LLM-PQ are comparing it to the libraries listed below
Sorting:
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated last year
- ☆15Feb 20, 2024Updated last year
- ☆47Jun 27, 2024Updated last year
- Distributed Deep Graph Learning Framework for Dynamic Graphs☆19Mar 25, 2024Updated last year
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆76Oct 15, 2025Updated 4 months ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Nov 21, 2024Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- Deft: A Scalable Tree Index for Disaggregated Memory☆23Apr 23, 2025Updated 9 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆13Dec 9, 2024Updated last year
- ☆10Apr 29, 2023Updated 2 years ago
- ☆17Jan 27, 2025Updated last year
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆15Jul 10, 2025Updated 7 months ago
- ☆19Jun 1, 2025Updated 8 months ago
- ☆14Mar 29, 2020Updated 5 years ago
- ☆64Dec 3, 2024Updated last year
- [IPDPS 2024] Adaptive neighbor sampling for temporal GNN☆16Feb 17, 2025Updated 11 months ago
- ☆16Feb 7, 2026Updated last week
- Stateful LLM Serving☆95Mar 11, 2025Updated 11 months ago
- Analysis for the traces from byteprofile☆32Nov 21, 2023Updated 2 years ago
- ☆56Jan 25, 2021Updated 5 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆62Jul 1, 2022Updated 3 years ago
- Multi-branch model for concurrent execution☆18Jun 27, 2023Updated 2 years ago
- ☆13Dec 16, 2021Updated 4 years ago
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆19Dec 8, 2023Updated 2 years ago
- ☆151Oct 9, 2024Updated last year
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- A resilient distributed training framework☆96Apr 11, 2024Updated last year
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- ☆15Jun 4, 2024Updated last year
- AI model training on heterogeneous, geo-distributed resources☆35Nov 24, 2025Updated 2 months ago
- ddl-benchmarks: Benchmarks for Distributed Deep Learning☆36May 29, 2020Updated 5 years ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆34May 6, 2024Updated last year
- ☆15Apr 20, 2022Updated 3 years ago
- Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"☆22Apr 13, 2024Updated last year
- ☆19May 4, 2023Updated 2 years ago
- ☆85Oct 17, 2025Updated 3 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"☆40Jul 6, 2023Updated 2 years ago
- Open source code of BGL NSDI 2023☆18Jul 24, 2023Updated 2 years ago