feifeibear / LLMSpeculativeSamplingLinks
Fast inference from large lauguage models via speculative decoding
β807Updated last year
Alternatives and similar repositories for LLMSpeculativeSampling
Users that are interested in LLMSpeculativeSampling are comparing it to the libraries listed below
Sorting:
- π° Must-read papers and blogs on Speculative Decoding β‘οΈβ890Updated last week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β306Updated 4 months ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.β467Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.β1,490Updated this week
- LongBench v2 and LongBench (ACL 25'&24')β951Updated 7 months ago
- Ring attention implementation with flash attentionβ841Updated 3 weeks ago
- π° Must-read papers on KV Cache Compression (constantly updating π€).β517Updated 3 weeks ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ549Updated last month
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline modβ¦β537Updated 11 months ago
- β·οΈ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)β981Updated 8 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,272Updated 5 months ago
- Best practice for training LLaMA models in Megatron-LMβ660Updated last year
- β332Updated last year
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichβ¦β1,053Updated 10 months ago
- [TMLR 2024] Efficient Large Language Models: A Surveyβ1,200Updated 2 months ago
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)β45Updated 5 months ago
- Awesome-LLM-KV-Cache: A curated list of πAwesome LLM KV Cache Papers with Codes.β351Updated 5 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruningβ630Updated last year
- Disaggregated serving system for Large Language Models (LLMs).β669Updated 4 months ago
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.β1,307Updated this week
- A flexible and efficient training framework for large-scale alignment tasksβ415Updated this week
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β201Updated 6 months ago
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarksβ265Updated last year
- β273Updated last month
- Awesome list for LLM pruning.β251Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.β346Updated last week
- REST: Retrieval-Based Speculative Decoding, NAACL 2024β207Updated 8 months ago
- slime is a LLM post-training framework aiming for RL Scaling.β1,420Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocationβ309Updated 4 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β735Updated 5 months ago