EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
☆74Jun 14, 2024Updated last year
Alternatives and similar repositories for EE-LLM
Users that are interested in EE-LLM are comparing it to the libraries listed below
Sorting:
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆361Feb 5, 2026Updated last month
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆67Jun 26, 2024Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- Improving transparency of large language models' reasoning☆14Nov 25, 2025Updated 3 months ago
- MPI Code Generation through Domain-Specific Language Models☆15Nov 19, 2024Updated last year
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆31Jan 21, 2026Updated last month
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 7 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆96Nov 10, 2025Updated 4 months ago
- A Gradio app for analyzing audio files to determine true sample rate and bit depth.☆19Sep 17, 2024Updated last year
- PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References☆20Jun 13, 2024Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Jun 20, 2024Updated last year
- ☆52Jul 18, 2024Updated last year
- qwen-nsa☆87Oct 14, 2025Updated 4 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆29Feb 27, 2025Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆22Aug 21, 2022Updated 3 years ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆55Jan 26, 2026Updated last month
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- ☆92Nov 25, 2023Updated 2 years ago
- This is open-source implementation of MixedAE (https://arxiv.org/pdf/2303.17152.pdf)☆22Feb 14, 2025Updated last year
- [ICLR 2025] Official implementation of DICL (Disentangled In-Context Learning), featured in the paper "Zero-shot Model-based Reinforcemen…☆26Feb 14, 2025Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated last year
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,131Jan 24, 2026Updated last month
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Apr 27, 2024Updated last year
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Aug 2, 2024Updated last year
- ☆30Jul 22, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆179Jul 12, 2024Updated last year
- Work in progress.☆79Nov 25, 2025Updated 3 months ago
- Train, tune, and infer Bamba model☆137Jun 4, 2025Updated 9 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆259Dec 16, 2024Updated last year
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated 9 months ago
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆53Dec 7, 2025Updated 3 months ago
- My fork os allen AI's OLMo for educational purposes.☆28Dec 5, 2024Updated last year
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆277Aug 31, 2024Updated last year