pan-x-c / EE-LLMView external linksLinks
EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
☆74Jun 14, 2024Updated last year
Alternatives and similar repositories for EE-LLM
Users that are interested in EE-LLM are comparing it to the libraries listed below
Sorting:
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆357Feb 5, 2026Updated last week
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆65Jun 26, 2024Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- Improving transparency of large language models' reasoning☆14Nov 25, 2025Updated 2 months ago
- MPI Code Generation through Domain-Specific Language Models☆14Nov 19, 2024Updated last year
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 6 months ago
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆30Jan 21, 2026Updated 3 weeks ago
- Code for paper "Patch-Level Training for Large Language Models"☆97Nov 10, 2025Updated 3 months ago
- A Gradio app for analyzing audio files to determine true sample rate and bit depth.☆19Sep 17, 2024Updated last year
- PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References☆20Jun 13, 2024Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆260Nov 18, 2024Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Jun 20, 2024Updated last year
- ☆52Jul 18, 2024Updated last year
- qwen-nsa☆87Oct 14, 2025Updated 4 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆22Aug 21, 2022Updated 3 years ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆55Jan 26, 2026Updated 2 weeks ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆24Apr 17, 2024Updated last year
- This is open-source implementation of MixedAE (https://arxiv.org/pdf/2303.17152.pdf)☆22Feb 14, 2025Updated last year
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- [ICLR 2025] Official implementation of DICL (Disentangled In-Context Learning), featured in the paper "Zero-shot Model-based Reinforcemen…☆26Feb 14, 2025Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,121Jan 24, 2026Updated 3 weeks ago
- Memo's Blog☆27Jan 21, 2026Updated 3 weeks ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Apr 27, 2024Updated last year
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Aug 2, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆176Jul 12, 2024Updated last year
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆31May 28, 2025Updated 8 months ago
- ☆30Jul 22, 2024Updated last year
- ☆67Oct 25, 2025Updated 3 months ago
- Work in progress.☆79Nov 25, 2025Updated 2 months ago
- Train, tune, and infer Bamba model☆137Jun 4, 2025Updated 8 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆260Dec 16, 2024Updated last year
- ☆31Sep 23, 2024Updated last year