thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
☆60Updated 6 months ago
Related projects: ⓘ
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆56Updated 7 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆115Updated 2 weeks ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆133Updated 3 months ago
- Repository of LV-Eval Benchmark☆41Updated 2 weeks ago
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆21Updated 7 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆134Updated 2 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆183Updated last month
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆158Updated 4 months ago
- Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆57Updated 3 weeks ago
- ☆164Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆81Updated 2 weeks ago
- Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting☆39Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆47Updated 2 weeks ago
- Multi-Candidate Speculative Decoding☆27Updated 4 months ago
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆65Updated 6 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆101Updated last week
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆148Updated 6 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly"☆121Updated 3 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 6 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search☆91Updated 3 months ago
- ☆87Updated 4 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆69Updated 6 months ago
- ☆75Updated this week
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆127Updated 3 months ago
- ☆104Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆87Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- Towards Systematic Measurement for Long Text Quality☆27Updated 2 weeks ago
- A repository sharing the literatures about large language models☆19Updated 3 weeks ago