infinigence/SpecEE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/infinigence/SpecEE)

infinigence / SpecEE

Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)

☆75

Alternatives and similar repositories for SpecEE

Users that are interested in SpecEE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

infinigence / Semi-PD
View on GitHub
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆127Dec 25, 2025Updated 6 months ago
dubcyfor3 / Focus
View on GitHub
[HPCA 2026 Best Paper Candidate] Official implementation of "Focus: A Streaming Concentration Architecture for Efficient Vision-Language …
☆59Feb 8, 2026Updated 5 months ago
redbird-arch / isca2025-chimera-artifact
View on GitHub
Artifact of Chimera
☆18May 6, 2025Updated last year
AI9Stars / SpecMQuant
View on GitHub
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
☆23May 29, 2025Updated last year
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
clevercool / ANT-Quantization
View on GitHub
☆123Nov 17, 2023Updated 2 years ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
snu-comparch / Tender
View on GitHub
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
☆34Jul 4, 2024Updated 2 years ago
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆50Dec 19, 2025Updated 7 months ago
Kimho666 / LLM_Hardware_Survey
View on GitHub
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
☆18Jul 15, 2025Updated last year
arkhadem / aim_simulator
View on GitHub
A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0
☆69Jul 22, 2025Updated 11 months ago
abdelfattah-lab / BitMoD-HPCA-25
View on GitHub
☆157Jul 19, 2025Updated last year
casys-kaist / oaken
View on GitHub
Artifact for Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
☆17May 9, 2025Updated last year
OpenBitSys / vlut.cpp
View on GitHub
[MobiSys 2026] On-device ultra-low-bit LLM inference with LUT.
☆22Jun 24, 2026Updated 3 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
jiashu-z / how-to-plot
View on GitHub
How to plot for papers, slides, demos, etc.
☆10Apr 7, 2022Updated 4 years ago
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
weihai-98 / A-2Q
View on GitHub
☆15Sep 30, 2023Updated 2 years ago
yc2367 / P3-LLM
View on GitHub
☆23Apr 3, 2026Updated 3 months ago
SNU-ARC / DecDEC
View on GitHub
[OSDI 2025] DecDEC: A Systems Approach to Advancing Low‑Bit LLM Quantization
☆26Jan 29, 2026Updated 5 months ago
GATECH-EIC / GCoD
View on GitHub
[HPCA 2022] GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
☆38Mar 30, 2022Updated 4 years ago
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
dgSPARSE / dgSPARSE-Lib
View on GitHub
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
☆122Jul 13, 2026Updated last week
I-Doctor / gnn-acceleration-framework-with-FPGA
View on GitHub
including compiler to encode DGL GNN model to instructions, runtime software to transfer data and control the accelerator, and hardware v…
☆14Nov 19, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tsinghua-ideal / ANSMET
View on GitHub
An accelerator for high-dimensional approximate nearest neighbor search
☆15May 17, 2025Updated last year
leesou / H2-LLM-ISCA-2025
View on GitHub
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
☆113Apr 26, 2025Updated last year
thustorage / PetPS
View on GitHub
PetPS: Supporting Huge Embedding Models with Tiered Memory
☆34May 21, 2024Updated 2 years ago
yaohsiaopid / rtl2uspec
View on GitHub
☆11Jul 1, 2025Updated last year
Yinxiao-Feng / chiplet-network-sim
View on GitHub
☆65Jun 3, 2025Updated last year
fix-project / fix
View on GitHub
☆12Jun 10, 2026Updated last month
zilongwang123 / LeaVe
View on GitHub
A tool for checking the contract satisfaction for hardware designs
☆12Jun 22, 2026Updated 3 weeks ago
PKU-SEC-Lab / AdapMoE
View on GitHub
Code release for AdapMoE accepted by ICCAD 2024
☆39Apr 28, 2025Updated last year
CLab-HKUST-GZ / micro58-axcore
View on GitHub
☆41Oct 21, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
taowen / hexagon-tutorial
View on GitHub
hexagon tutorial
☆54Mar 29, 2026Updated 3 months ago
horizon-research / Efficient-Deep-Learning-for-Point-Clouds
View on GitHub
☆49Apr 22, 2021Updated 5 years ago
Dystopians / SecDOOD
View on GitHub
[ ICCV'2025 Poster ] SecDOOD is a secure cloud-device collaboration framework for efficient on-device OOD detection without requiring de…
☆15Oct 10, 2025Updated 9 months ago
gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
SJTU-ReArch-Group / M2XFP_ASPLOS26
View on GitHub
[ASPLOS 2026] M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.
☆15Jan 29, 2026Updated 5 months ago
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year