uservan / speculative_thinkingLinks

☆30

Alternatives and similar repositories for speculative_thinking

Users that are interested in speculative_thinking are comparing it to the libraries listed below

Sorting:

ruipeterpan / specreason
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆59Updated 2 months ago
mutonix / pyramidinfer
☆49Updated last year
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆52Updated last year
OpenSparseLLMs / Linear-MoE
☆120Updated 6 months ago
hyx1999 / SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆38Updated 9 months ago
Jikai0Wang / Speculative_CoT
☆19Updated 6 months ago
abdelfattah-lab / SplitReason
☆21Updated 2 weeks ago
thunlp / FR-Spec
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆48Updated 4 months ago
Jikai0Wang / OPT-Tree
☆29Updated 6 months ago
hemingkx / SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆60Updated 9 months ago
Dominic789654 / LongGenBench
Source code for the paper "LongGenBench: Long-context Generation Benchmark"
☆24Updated last year
sail-sg / LongSpec
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆68Updated 4 months ago
Linking-ai / SCOPE
(ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆33Updated 6 months ago
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆52Updated last year
WANGXinyiLinda / planning_tokens
Official code for Guiding Language Model Math Reasoning with Planning Tokens
☆18Updated last year
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆116Updated 4 months ago
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆49Updated 4 months ago
NJUNLP / MCSD
Multi-Candidate Speculative Decoding
☆37Updated last year
thu-wyz / inference_scaling
☆76Updated last year
mozhu621 / LongGenBench
☆29Updated 2 months ago
NonvolatileMemory / GliDe_with_a_CaPE_ICML_24
official code for GliDe with a CaPE
☆18Updated last year
yaof20 / DenseMixer
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
☆61Updated 4 months ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Updated last year
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆132Updated last month
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆98Updated 9 months ago
BaohaoLiao / RSD
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
☆52Updated 7 months ago
alessiodevoto / l2compress
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆17Updated 11 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆119Updated 11 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆112Updated 8 months ago
Jingyu6 / speculative_prefill
☆47Updated 6 months ago