stevelaskaridis / awesome-mobile-llm
Awesome Mobile LLMs
☆169Updated last month
Alternatives and similar repositories for awesome-mobile-llm:
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- ☆56Updated 5 months ago
- Fast Multimodal LLM on Mobile Devices☆824Updated last month
- High-speed and easy-use LLM serving framework for local deployment☆99Updated last month
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆259Updated 2 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆424Updated 3 months ago
- Awesome list for LLM quantization☆201Updated 4 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆438Updated this week
- A family of compressed models obtained via pruning and knowledge distillation☆334Updated 5 months ago
- LLM-Inference-Bench☆39Updated 3 months ago
- scalable and robust tree-based speculative decoding algorithm☆343Updated 2 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆234Updated last year
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆283Updated 3 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated 7 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆266Updated this week
- codebase for "MELTing Point: Mobile Evaluation of Language Transformers"☆18Updated 9 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆643Updated last month
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆181Updated 3 months ago
- ☆208Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 6 months ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod …☆444Updated 7 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆146Updated this week
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆263Updated 6 months ago
- ☆90Updated 6 months ago
- A collection of all available inference solutions for the LLMs☆86Updated last month
- ☆531Updated 5 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆52Updated 3 weeks ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆115Updated last month
- Survey Paper List - Efficient LLM and Foundation Models☆246Updated 7 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆288Updated 3 months ago