☆34Jun 22, 2024Updated last year
Alternatives and similar repositories for InferCept
Users that are interested in InferCept are comparing it to the libraries listed below
Sorting:
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Nov 21, 2024Updated last year
- ☆12Oct 16, 2022Updated 3 years ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆64Oct 2, 2025Updated 4 months ago
- Paper-reading notes for Berkeley OS prelim exam.☆14Aug 28, 2024Updated last year
- ☆17May 10, 2024Updated last year
- ☆19May 4, 2023Updated 2 years ago
- ☆131Nov 11, 2024Updated last year
- A Streaming-Native Serving Engine for TTS/STS Models☆55Updated this week
- APEX+ is an LLM Serving Simulator☆42Jun 16, 2025Updated 8 months ago
- Query-Adaptive Vector Search☆68Feb 13, 2026Updated 2 weeks ago
- Stateful LLM Serving☆96Mar 11, 2025Updated 11 months ago
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆19Mar 8, 2025Updated 11 months ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆74Sep 15, 2025Updated 5 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆94Jul 14, 2023Updated 2 years ago
- ☆64Dec 3, 2024Updated last year
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆77Oct 15, 2025Updated 4 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆174Jul 10, 2024Updated last year
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- Nex Venus Communication Library☆72Nov 17, 2025Updated 3 months ago
- Prefix-Aware Attention for LLM Decoding☆27Jan 23, 2026Updated last month
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated 3 weeks ago
- A parallelism VAE avoids OOM for high resolution image generation☆85Aug 4, 2025Updated 6 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated 2 years ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 6 months ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆52Mar 5, 2025Updated 11 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Code for the paper "SMACE: A New Method for the Interpretability of Composite Decision Systems", ECML 2022☆15Apr 17, 2023Updated 2 years ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"☆48Jul 29, 2025Updated 7 months ago
- Enhanced Explainable Neural Network☆10Dec 25, 2021Updated 4 years ago
- ☆30Aug 31, 2022Updated 3 years ago
- build a simple key value store based on LSM tree like rocksdb/leveldb☆41Mar 16, 2020Updated 5 years ago
- ☆74Sep 15, 2025Updated 5 months ago
- ☆164Jul 15, 2025Updated 7 months ago
- JSSP dataset for LLMs☆17May 29, 2025Updated 9 months ago
- Official code for Conformal Isometry of Lie Group Representation in Recurrent Network of Grid Cells (NeurIPS workshop on Symmetry and Geo…☆13Nov 1, 2022Updated 3 years ago
- Comparing sequential forecasters via confidence sequences & e-processes☆11Oct 24, 2023Updated 2 years ago
- ☆10Oct 26, 2022Updated 3 years ago