KaiLv69/DuoDecoding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KaiLv69/DuoDecoding)

KaiLv69 / DuoDecoding

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

☆19

Alternatives and similar repositories for DuoDecoding

Users that are interested in DuoDecoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tengxiaoliu / LM_skip
View on GitHub
[NeurIPS 2024] Can Language Models Learn to Skip Steps?
☆21Jan 25, 2025Updated last year
Leosang-lx / FlowSpec
View on GitHub
Continuous Pipelined Speculative Decoding
☆21May 25, 2026Updated last month
NoakLiu / Awesome-Distributed-RL
View on GitHub
A Collection for Distributed Reinforcement Learning Papers
☆18Sep 24, 2025Updated 9 months ago
ayyyq / TARA
View on GitHub
code for [ACL23] An AMR-based Link Prediction Approach for Document-level Event Argument Extraction
☆24Oct 2, 2023Updated 2 years ago
yhcc / utcie
View on GitHub
This is the code repo for the paper <UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction>
☆15Aug 10, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
smart-lty / REST
View on GitHub
The code of paper "Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction" in NeurIPS 2023.
☆14Nov 25, 2023Updated 2 years ago
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
OpenLMLab / LongWanjuan
View on GitHub
Towards Systematic Measurement for Long Text Quality
☆39Sep 5, 2024Updated last year
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆169Dec 23, 2025Updated 6 months ago
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆32Oct 20, 2025Updated 9 months ago
aiha-lab / MX-QLLM
View on GitHub
LLM Inference with Microscaling Format
☆35Nov 12, 2024Updated last year
NoakLiu / FastCache-xDiT
View on GitHub
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
☆52Apr 29, 2026Updated 2 months ago
OpenMOSS / Thus-Spake-Long-Context-LLM
View on GitHub
a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation
☆62Mar 31, 2025Updated last year
tim-lawson / skip-middle
View on GitHub
Learning to Skip the Middle Layers of Transformers
☆17Aug 7, 2025Updated 11 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tengxiaoliu / XoT
View on GitHub
[EMNLP 2023] Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
☆27Nov 4, 2023Updated 2 years ago
oujieww / ANPD
View on GitHub
☆11Feb 5, 2026Updated 5 months ago
HKUNLP / STRING
View on GitHub
[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"
☆82Nov 25, 2024Updated last year
gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
GAIR-NLP / alignment-for-honesty
View on GitHub
☆78May 22, 2024Updated 2 years ago
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
KevinLee1110 / dynamic-batching
View on GitHub
The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"
☆18Mar 17, 2025Updated last year
MobiSense / SpecOffload-public
View on GitHub
☆29Feb 3, 2026Updated 5 months ago
xiami2019 / CLAIF
View on GitHub
[Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback
☆40Aug 14, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
QipengGuo / NLP-Notes
View on GitHub
Notes of my introduction about NLP in Fudan University
☆37Jul 6, 2021Updated 5 years ago
Tomorrowdawn / top_nsigma
View on GitHub
The official code repo and data hub of top_nsigma sampling strategy for LLMs.
☆26Feb 11, 2025Updated last year
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
Summer-Summer / Kitty
View on GitHub
Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference.
☆17May 20, 2026Updated 2 months ago
icip-cas / DeepSolution
View on GitHub
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking
☆48Dec 18, 2025Updated 7 months ago
LiuXiaoxuanPKU / Cost-Model-papers
View on GitHub
☆13Feb 22, 2023Updated 3 years ago
LgQu / TIGeR
View on GitHub
Code for paper: Unified Text-to-Image Generation and Retrieval
☆16Updated this week
EvanZhuang / AgenticLU
View on GitHub
Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).
☆13Sep 22, 2025Updated 9 months ago
infinigence / Semi-PD
View on GitHub
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆127Dec 25, 2025Updated 6 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
tengxiaoliu / RLET
View on GitHub
[EMNLP 2022] RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees
☆11Jul 15, 2023Updated 3 years ago
open-nlplab / fastchatgpt
View on GitHub
A python tool help to interact with chatgpt.
☆10Dec 11, 2022Updated 3 years ago
hellangleZ / Qwen3_autothink_adapter
View on GitHub
Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…
☆22May 9, 2025Updated last year
candlezang / up-embedded-QT
View on GitHub
尚观嵌入式课程QT图形界面课程
☆11Dec 26, 2016Updated 9 years ago
choosewhatulike / cluster-clip
View on GitHub
Multi-GPU supported kmeans clustering for cluser-clip
☆15Jun 3, 2024Updated 2 years ago
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated 11 months ago
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year