pan-x-c / EE-LLMView external linksLinks
EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
☆74Jun 14, 2024Updated last year
Alternatives and similar repositories for EE-LLM
Users that are interested in EE-LLM are comparing it to the libraries listed below
Sorting:
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆357Feb 5, 2026Updated last week
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- MPI Code Generation through Domain-Specific Language Models☆14Nov 19, 2024Updated last year
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 6 months ago
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆30Jan 21, 2026Updated 3 weeks ago
- Code for paper "Patch-Level Training for Large Language Models"☆97Nov 10, 2025Updated 3 months ago
- A Gradio app for analyzing audio files to determine true sample rate and bit depth.☆19Sep 17, 2024Updated last year
- PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References☆20Jun 13, 2024Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆260Nov 18, 2024Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Jun 20, 2024Updated last year
- qwen-nsa☆87Oct 14, 2025Updated 3 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆29Feb 27, 2025Updated 11 months ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆55Jan 26, 2026Updated 2 weeks ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- This is open-source implementation of MixedAE (https://arxiv.org/pdf/2303.17152.pdf)☆22Feb 14, 2025Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- [ICLR 2025] Official implementation of DICL (Disentangled In-Context Learning), featured in the paper "Zero-shot Model-based Reinforcemen…☆26Feb 14, 2025Updated 11 months ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,121Jan 24, 2026Updated 2 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆176Jul 12, 2024Updated last year
- ☆67Oct 25, 2025Updated 3 months ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆31May 28, 2025Updated 8 months ago
- Work in progress.☆79Nov 25, 2025Updated 2 months ago
- Train, tune, and infer Bamba model☆137Jun 4, 2025Updated 8 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆260Dec 16, 2024Updated last year
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆52Dec 7, 2025Updated 2 months ago
- My fork os allen AI's OLMo for educational purposes.☆29Dec 5, 2024Updated last year
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆276Aug 31, 2024Updated last year
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆143Apr 8, 2025Updated 10 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆65Oct 31, 2025Updated 3 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Jun 17, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆89Oct 30, 2024Updated last year
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆79Oct 16, 2024Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year