Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆71Apr 25, 2025Updated 10 months ago
Alternatives and similar repositories for SpecEE
Users that are interested in SpecEE are comparing it to the libraries listed below
Sorting:
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- How to plot for papers, slides, demos, etc.☆10Apr 7, 2022Updated 3 years ago
- including compiler to encode DGL GNN model to instructions, runtime software to transfer data and control the accelerator, and hardware v…☆14Nov 19, 2023Updated 2 years ago
- ☆13Sep 30, 2023Updated 2 years ago
- [HPCA 2026 Best Paper Candidate] Official implementation of "Focus: A Streaming Concentration Architecture for Efficient Vision-Language …☆33Feb 8, 2026Updated 3 weeks ago
- GPU-accelerated LLM Training Simulator☆17Jun 26, 2025Updated 8 months ago
- HW/SW co-designed end-host RPC stack☆20Oct 28, 2021Updated 4 years ago
- ☆17May 10, 2024Updated last year
- Book reading☆16Jun 5, 2020Updated 5 years ago
- Venus Collective Communication Library, supported by SII and Infrawaves.☆138Updated this week
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- ☆23Mar 18, 2024Updated last year
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- A benchmark suited especially for deep learning operators☆42Feb 13, 2023Updated 3 years ago
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆25Jul 20, 2019Updated 6 years ago
- ☆116Nov 17, 2023Updated 2 years ago
- analyse problems of AI with Math and Code☆27Jul 28, 2025Updated 7 months ago
- ☆33Jun 6, 2023Updated 2 years ago
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆36Jan 16, 2025Updated last year
- ☆29Apr 4, 2024Updated last year
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆102Dec 15, 2025Updated 2 months ago
- ☆527Feb 10, 2026Updated 3 weeks ago
- Artifacts of EuroSys'24 paper "Exploring Performance and Cost Optimization with ASIC-Based CXL Memory"☆31Feb 21, 2024Updated 2 years ago
- USTC计算物理A☆10Aug 16, 2021Updated 4 years ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆817Mar 6, 2025Updated last year
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Feb 18, 2022Updated 4 years ago
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆70Mar 20, 2025Updated 11 months ago
- TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches☆80Jul 25, 2023Updated 2 years ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 6 months ago
- ☆21Nov 12, 2025Updated 3 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- ☆79Mar 7, 2022Updated 3 years ago
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆33May 21, 2024Updated last year
- paper and its code for AI System☆351Feb 10, 2026Updated 3 weeks ago
- ☆10Mar 8, 2025Updated 11 months ago
- Learn how to create impactful AI Agents using Agno AI Python Package☆13Jul 31, 2025Updated 7 months ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- langgraph的deepagent源码分析☆15Jan 1, 2026Updated 2 months ago