teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
β114Updated last year
Alternatives and similar repositories for parallel-decoding:
Users that are interested in parallel-decoding are comparing it to the libraries listed below
- β101Updated last year
- π₯ A minimal training framework for scaling FLA modelsβ79Updated this week
- Sparse Backpropagation for Mixture-of-Expert Trainingβ28Updated 8 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)β56Updated 5 months ago
- β139Updated last year
- β49Updated 10 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"β27Updated 11 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β70Updated last year
- [KDD'22] Learned Token Pruning for Transformersβ96Updated 2 years ago
- β36Updated 6 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformersβ207Updated 7 months ago
- β94Updated 9 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"β59Updated 5 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoderβ89Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β207Updated 3 months ago
- β87Updated 5 months ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance samplingβ82Updated 2 years ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β74Updated 9 months ago
- β47Updated last year
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwβ¦β26Updated 10 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β163Updated last week
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)β60Updated last year
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining itsβ¦β21Updated 6 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Lawsβ53Updated 5 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β36Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasksβ96Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β130Updated 5 months ago
- β115Updated last month