goliaro/specinfer-ae

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/goliaro/specinfer-ae)

goliaro / specinfer-ae

☆28

Alternatives and similar repositories for specinfer-ae

Users that are interested in specinfer-ae are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NJUNLP / MCSD
View on GitHub
Multi-Candidate Speculative Decoding
☆41Apr 22, 2024Updated 2 years ago
uw-mad-dash / decoding-speculative-decoding
View on GitHub
☆16Aug 19, 2024Updated last year
caoshiyi / artifacts
View on GitHub
☆40Nov 28, 2024Updated last year
kimkoech / Systolic-Array
View on GitHub
C++ SystemC Implementation of a Systolic Array
☆16May 15, 2020Updated 6 years ago
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yandex-research / specexec
View on GitHub
☆68Nov 4, 2024Updated last year
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated 2 years ago
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆921Aug 22, 2024Updated last year
a1bc2def6g / fastgl-ae
View on GitHub
☆17Jun 25, 2024Updated 2 years ago
promoe-opensource / promoe
View on GitHub
☆20Jan 27, 2025Updated last year
AutonomicPerfectionist / PipeInfer
View on GitHub
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
☆32Nov 16, 2024Updated last year
PSCLab-ASU / Systolic-CNN
View on GitHub
☆18Feb 13, 2021Updated 5 years ago
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆12Oct 22, 2024Updated last year
SFU-HiAccel / HiSpMV
View on GitHub
[TRETS 2025][FPGA 2024] FPGA Accelerator for Imbalanced SpMV using HLS
☆23Aug 24, 2025Updated 11 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
inpluslab-wuhui / Systems-for-Foundation-Models
View on GitHub
☆20May 10, 2025Updated last year
awslabs / optimizing-multitask-training-through-dynamic-pipelines
View on GitHub
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆19Dec 8, 2023Updated 2 years ago
VITA-Group / llm-kick
View on GitHub
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
☆27Apr 21, 2025Updated last year
harshanarayana / kube-scheduler
View on GitHub
Custom Python Scheduler for Kubernetes
☆15Jan 25, 2020Updated 6 years ago
hbiyik / hw_necromancer
View on GitHub
☆10Jun 6, 2026Updated last month
Xilinx / libdfx
View on GitHub
☆13Jun 14, 2026Updated last month
ybingo / OI_Sharing
View on GitHub
分享收集的在算法竞赛、数据结构方面的课件、论文、书籍、OJ网站、习题。
☆14May 21, 2020Updated 6 years ago
deJQK / FracBits
View on GitHub
Neural Network Quantization With Fractional Bit-widths
☆11Feb 19, 2021Updated 5 years ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆402Apr 22, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NorbertZheng / SoC-Lab
View on GitHub
大二下学期——计算机系统综合(SoC)——实验
☆11Apr 23, 2019Updated 7 years ago
MAC-AutoML / MotionCache
View on GitHub
[ICML 2026]
☆17Jul 4, 2026Updated 3 weeks ago
Mixture-AI / Mixture-of-Depths
View on GitHub
Google DeepMind: Mixture of Depths Unofficial Implementation.
☆12May 29, 2024Updated 2 years ago
sharc-lab / GenGNN
View on GitHub
☆37Jan 20, 2022Updated 4 years ago
luoyesiqiu / LibInject
View on GitHub
DLL注入工具
☆13Nov 9, 2020Updated 5 years ago
tissue3 / EyerissSimulator
View on GitHub
Eyeriss chip simulator
☆41Mar 6, 2020Updated 6 years ago
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,898Updated this week
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
IronySuzumiya / NiuDianNao
View on GitHub
A simple cycle-accurate DaDianNao simulator
☆13Mar 27, 2019Updated 7 years ago
RW999creator / GPP-PIM
View on GitHub
☆10Sep 26, 2024Updated last year
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆37May 6, 2024Updated 2 years ago
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,283Jun 27, 2026Updated last month
lucidrains / speculative-decoding
View on GitHub
Explorations into some recent techniques surrounding speculative decoding
☆307Dec 22, 2024Updated last year
lol0963332320 / ICLAB
View on GitHub
☆19Mar 23, 2023Updated 3 years ago
P1ayer-1 / Llama-LibTorch
View on GitHub
Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5
☆16Sep 19, 2024Updated last year