uservan / speculative_thinking
☆16Updated last month
Alternatives and similar repositories for speculative_thinking
Users that are interested in speculative_thinking are comparing it to the libraries listed below
Sorting:
- ☆11Updated this week
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆44Updated 6 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆26Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆45Updated 6 months ago
- ☆82Updated this week
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆12Updated 4 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆46Updated 5 months ago
- ☆22Updated last month
- Official Implementation of FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation☆19Updated last month
- ☆40Updated 5 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆37Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆63Updated last month
- ☆22Updated 10 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆19Updated 2 weeks ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆36Updated 3 weeks ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆21Updated 4 months ago
- ☆37Updated 8 months ago
- ☆20Updated 2 months ago
- ☆39Updated last month
- ☆19Updated 4 months ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆16Updated 7 months ago
- ☆28Updated last month
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆16Updated 2 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆34Updated 3 weeks ago
- LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification☆52Updated 2 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆73Updated 3 months ago
- More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression☆11Updated 3 months ago