hemingkx / SWIFT
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆22Updated last month
Related projects ⓘ
Alternatives and complementary repositories for SWIFT
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆113Updated last week
- ☆31Updated 2 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆52Updated last month
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆36Updated this week
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆53Updated 3 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆34Updated 7 months ago
- Multi-Candidate Speculative Decoding☆28Updated 6 months ago
- OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure☆18Updated 3 months ago
- ☆15Updated last month
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆75Updated last month
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆21Updated 2 weeks ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆36Updated 3 weeks ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆21Updated 4 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆76Updated 3 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆47Updated 2 weeks ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆134Updated last month
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆96Updated 7 months ago
- Towards Systematic Measurement for Long Text Quality☆28Updated 2 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- The official repository of the Omni-MATH benchmark.☆47Updated last week
- Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆61Updated last week
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆66Updated 3 weeks ago
- Long Context Extension and Generalization in LLMs☆39Updated last month
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆33Updated 11 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆68Updated 3 weeks ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 8 months ago
- ☆27Updated last year
- A method of ensemble learning for heterogeneous large language models.☆30Updated 3 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆38Updated 3 months ago