stevelaskaridis / awesome-mobile-llm
Awesome Mobile LLMs
☆87Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-mobile-llm
- codebase for "MELTing Point: Mobile Evaluation of Language Transformers"☆13Updated 4 months ago
- EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).☆49Updated 5 months ago
- ☆63Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆135Updated 5 months ago
- Awesome list for LLM quantization☆127Updated this week
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆120Updated 3 weeks ago
- Efficient LLM Inference Acceleration using Prompting☆43Updated last month
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆248Updated this week
- The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".☆48Updated 3 weeks ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆143Updated this week
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆146Updated 8 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- LLM Serving Performance Evaluation Harness☆56Updated 2 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆166Updated 3 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆241Updated last month
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!☆313Updated 2 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆224Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆229Updated 3 weeks ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆68Updated 5 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆51Updated 6 months ago
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆147Updated 4 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆226Updated last month
- A minimal implementation of vllm.☆30Updated 3 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆172Updated this week
- A large-scale simulation framework for LLM inference☆278Updated this week
- ☆122Updated 10 months ago