Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆71Apr 25, 2025Updated 11 months ago
Alternatives and similar repositories for SpecEE
Users that are interested in SpecEE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆125Dec 25, 2025Updated 3 months ago
- [HPCA 2026 Best Paper Candidate] Official implementation of "Focus: A Streaming Concentration Architecture for Efficient Vision-Language …☆46Feb 8, 2026Updated 2 months ago
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆121Apr 13, 2026Updated last week
- ☆11Jul 1, 2025Updated 9 months ago
- Extending BookSim2.0 and HotSpot6.0 for Power, Performance and Thermal evaluation of 3D NoC Architectures☆13Aug 9, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- GPU-accelerated LLM Training Simulator☆18Jun 26, 2025Updated 9 months ago
- ☆12Jan 9, 2026Updated 3 months ago
- A tool for checking the contract satisfaction for hardware designs☆13Nov 4, 2025Updated 5 months ago
- ☆17May 10, 2024Updated last year
- ☆120Nov 17, 2023Updated 2 years ago
- [ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆82Jul 14, 2025Updated 9 months ago
- ☆11Apr 16, 2023Updated 3 years ago
- DeepGate3 for ICCAD2024☆14May 26, 2025Updated 10 months ago
- ☆16Dec 9, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- analyse problems of AI with Math and Code☆27Jul 28, 2025Updated 8 months ago
- Sys, but no longer in Haskell☆19Mar 14, 2022Updated 4 years ago
- Tool for compiling Lean to WASM☆24Mar 17, 2024Updated 2 years ago
- Venus Collective Communication Library, supported by SII and Infrawaves.☆142Apr 13, 2026Updated last week
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- [HPCA 2022] GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design☆39Mar 30, 2022Updated 4 years ago
- The wafer-native AI accelerator simulation platform and inference engine.☆53Jan 1, 2026Updated 3 months ago
- This repository contains the artifact for the SOSP'23 paper: Sishuai Gong, Dinglan Peng, Deniz Altınbüken, Pedro Fonseca, Petros Maniati…☆15Oct 24, 2023Updated 2 years ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Mar 11, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆826Mar 6, 2025Updated last year
- ☆12Jan 12, 2024Updated 2 years ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆110Dec 15, 2025Updated 4 months ago
- 不炼金丹不坐禅, 不为商贾不耕田。 闲来写就青山卖, 不使人间造孽钱。☆41Sep 14, 2019Updated 6 years ago
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated 10 months ago
- MICRO 2023 Evaluation Artifact for TeAAL☆10Oct 26, 2023Updated 2 years ago
- λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)☆14Apr 2, 2025Updated last year
- Simulator for HDD/SSD, derived from the CMU PDL DiskSim, with the SSD-add-on patch from Microsoft Research applied.☆15Dec 30, 2019Updated 6 years ago
- ☆23Mar 18, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Instruction Pointer Classifier and Dynamic Degree Stream based Hardware Cache Prefetching☆16Nov 16, 2019Updated 6 years ago
- ☆23Dec 23, 2025Updated 3 months ago
- ☆62Jun 3, 2025Updated 10 months ago
- ☆15Apr 11, 2024Updated 2 years ago
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models☆28Apr 2, 2026Updated 2 weeks ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆37Aug 29, 2025Updated 7 months ago
- official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".☆56Dec 25, 2025Updated 3 months ago