Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆99Aug 20, 2023Updated 2 years ago
Alternatives and similar repositories for speculative-sampling
Users that are interested in speculative-sampling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Explorations into some recent techniques surrounding speculative decoding☆305Dec 22, 2024Updated last year
- Fast inference from large lauguage models via speculative decoding☆917Aug 22, 2024Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆98Feb 6, 2024Updated 2 years ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆111Feb 29, 2024Updated 2 years ago
- Beyond LM: How can language model go forward in the future?☆15Apr 30, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation☆15Jul 15, 2024Updated last year
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 5 years ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,751Jun 25, 2024Updated last year
- Dive-into-LLMs Tutorial for Beginners☆25May 14, 2024Updated 2 years ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆397Apr 22, 2025Updated last year
- This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Mai…☆37Oct 5, 2025Updated 8 months ago
- Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.☆110Dec 2, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,397Feb 20, 2026Updated 3 months ago
- AskUp Search ChatGPT Plugin☆20May 27, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Calculating Expected Time for training LLM.☆39Apr 17, 2023Updated 3 years ago
- ☆23Jul 10, 2023Updated 2 years ago
- Minimal RLHF implementation built on top of minGPT.☆32Jul 4, 2024Updated last year
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆722Aug 13, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,337Mar 6, 2025Updated last year
- Guide for fixing 99-100% of cracking sound issues on Dell XPS 15 9570☆11Nov 1, 2018Updated 7 years ago
- ☆19Aug 1, 2020Updated 5 years ago
- ☆107Jun 20, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆24Mar 19, 2022Updated 4 years ago
- ☆33Apr 12, 2021Updated 5 years ago
- Auxiliary tasks for task-oriented dialogue systems. Published in ICNLSP'22 and indexed in the ACL Anthology.☆17Feb 27, 2023Updated 3 years ago
- 2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261☆13Aug 22, 2021Updated 4 years ago
- annotated-transformer-kr☆15May 16, 2019Updated 7 years ago
- ☆44Apr 22, 2026Updated last month
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆44May 22, 2025Updated last year
- [HCLT 2022] Korean sentence text similarity dataset using naver shopping review☆25Oct 20, 2022Updated 3 years ago
- CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).☆62May 3, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 4 years ago
- GPT-jax based on the official huggingface library☆13Jun 22, 2021Updated 4 years ago
- This is the official repo for the CVPR 2021 L2ID paper "Distill on the Go: Online knowledge distillation in self-supervised learning"☆12Nov 15, 2021Updated 4 years ago
- ☆20Nov 3, 2024Updated last year
- BCQ tutorial for transformers☆16Jul 17, 2023Updated 2 years ago
- huggingface를 이용하여 downstream task 수행하기☆62Dec 28, 2021Updated 4 years ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆69Oct 31, 2025Updated 7 months ago