raymin0223/fast_robust_early_exit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/raymin0223/fast_robust_early_exit)

raymin0223 / fast_robust_early_exit

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

☆67

Alternatives and similar repositories for fast_robust_early_exit

Users that are interested in fast_robust_early_exit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

joonkeekim / Instructive-Decoding
View on GitHub
Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…
☆21Mar 7, 2024Updated 2 years ago
llyx97 / Rosita
View on GitHub
[AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan
☆14Oct 18, 2022Updated 3 years ago
pan-x-c / EE-LLM
View on GitHub
EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
☆82Jun 14, 2024Updated 2 years ago
NJUNLP / MCSD
View on GitHub
Multi-Candidate Speculative Decoding
☆41Apr 22, 2024Updated 2 years ago
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lucidrains / speculative-decoding
View on GitHub
Explorations into some recent techniques surrounding speculative decoding
☆307Dec 22, 2024Updated last year
oujieww / ANPD
View on GitHub
☆11Feb 5, 2026Updated 5 months ago
FranxYao / Retrieval-Head-with-Flash-Attention
View on GitHub
Efficient retrieval head analysis with triton flash attention that supports topK probability
☆13Jun 15, 2024Updated 2 years ago
DRSY / KV_Compression
View on GitHub
[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens
☆25Nov 6, 2023Updated 2 years ago
joonkeekim / hare-hate-speech
View on GitHub
Official repository of "HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning", Findings of EMNLP 2023
☆28Jan 25, 2024Updated 2 years ago
Equationliu / Kangaroo
View on GitHub
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆72Jun 26, 2024Updated 2 years ago
facebookresearch / LayerSkip
View on GitHub
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆374Jul 20, 2026Updated last week
prateeky2806 / ComPEFT
View on GitHub
☆26Nov 23, 2023Updated 2 years ago
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
FasterDecoding / TEAL
View on GitHub
☆168Feb 15, 2025Updated last year
hdong920 / LESS
View on GitHub
☆53May 13, 2024Updated 2 years ago
wln20 / Attention-Viewer
View on GitHub
A plug-and-play tool for visualizing attention-score heatmap in generative LLMs. Easy to customize for your own need.
☆52May 16, 2024Updated 2 years ago
kssteven418 / BigLittleDecoder
View on GitHub
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆99Feb 6, 2024Updated 2 years ago
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
etri-edgeai / nn-comp-llm
View on GitHub
☆17Oct 10, 2024Updated last year
liangyuRain / ForestColl
View on GitHub
☆20Jun 1, 2026Updated last month
jongwooko / distillm
View on GitHub
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
☆267Mar 13, 2025Updated last year
hbin0701 / Self-Explore
View on GitHub
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆52May 4, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ChandlerGuan / Transkimmer
View on GitHub
Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim
☆22Aug 21, 2022Updated 3 years ago
etri-edgeai / nn-dist-train-poc
View on GitHub
☆17Oct 12, 2024Updated last year
lfsszd / CS-Drafting
View on GitHub
Cascade Speculative Drafting
☆33Apr 2, 2024Updated 2 years ago
zhengkid / Parallel_Thinking_via_MoT
View on GitHub
Official Code for "Learning to Reason via Mixture-of-Thought for Logical Reasoning"
☆29Nov 20, 2025Updated 8 months ago
linfeng93 / BiTA
View on GitHub
An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.
☆29Apr 15, 2025Updated last year
ClubieDong / QAQ-KVCacheQuantization
View on GitHub
QAQ: Quality Adaptive Quantization for LLM KV Cache
☆55Mar 27, 2024Updated 2 years ago
Seondong / LocEmb
View on GitHub
LocEmb: Location Embedding (Currently covering districts, roads, and businesses in Korea)
☆11Aug 15, 2022Updated 3 years ago
scrambledpie / GPVAE
View on GitHub
Train and visualise a latent variable model of moving objects.
☆16Apr 28, 2020Updated 6 years ago
sade-adrien / SteloCoder
View on GitHub
☆16Dec 21, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
epfml / dynamic-sparse-flash-attention
View on GitHub
☆152Jun 2, 2023Updated 3 years ago
jongwooko / Pytorch-MiniLM
View on GitHub
Unofficial Pytorch implementation of MiniLM and MiniLMv2
☆23Jan 30, 2022Updated 4 years ago
metacarbon / shareAtt
View on GitHub
Beyond KV Caching: Shared Attention for Efficient LLMs
☆20Jul 19, 2024Updated 2 years ago
dannyallover / overthinking_the_truth
View on GitHub
☆29Apr 30, 2024Updated 2 years ago
HanGuo97 / lq-lora
View on GitHub
☆129Jan 22, 2024Updated 2 years ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆402Apr 22, 2025Updated last year