facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆229Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for LayerSkip
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- An Open Source Toolkit For LLM Distillation☆356Updated 2 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆196Updated 6 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆283Updated last week
- ☆184Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆180Updated 3 weeks ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆145Updated last week
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆293Updated 11 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last month
- ☆105Updated 2 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆252Updated last year
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆177Updated last month
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆307Updated 7 months ago
- code for training & evaluating Contextual Document Embedding models☆117Updated this week
- A compact LLM pretrained in 9 days by using high quality data☆262Updated last month
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆193Updated this week
- The official evaluation suite and dynamic data release for MixEval.☆224Updated last week
- ☆122Updated 9 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆165Updated 2 weeks ago
- ☆451Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆685Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- Prune transformer layers☆64Updated 5 months ago
- ☆118Updated 3 months ago
- A simple unified framework for evaluating LLMs☆145Updated last week