MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,410Apr 21, 2025Updated 10 months ago
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below
Sorting:
- Everything about the SmolLM and SmolVLM family of models☆3,636Jan 13, 2026Updated last month
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,896May 3, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,625Sep 10, 2025Updated 5 months ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,334Apr 15, 2024Updated last year
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆927Oct 28, 2024Updated last year
- 4M: Massively Multimodal Masked Modeling☆1,788Jun 2, 2025Updated 8 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,125May 19, 2025Updated 9 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,187Jul 11, 2024Updated last year
- On-device AI across mobile, embedded and edge for PyTorch☆4,312Updated this week
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆668May 10, 2025Updated 9 months ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,644Aug 10, 2024Updated last year
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,084Jul 29, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,752Jul 18, 2025Updated 7 months ago
- PyTorch native quantization and sparsity for training and inference☆2,707Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,441Jul 17, 2025Updated 7 months ago
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated last week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,182Feb 22, 2026Updated last week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆358Feb 5, 2026Updated 3 weeks ago
- High-speed Large Language Model Serving for Local Deployment☆8,729Jan 24, 2026Updated last month
- Tools for merging pretrained large language models.☆6,814Jan 26, 2026Updated last month
- CoreNet: A library for training deep neural networks☆7,011Oct 9, 2025Updated 4 months ago
- GRadient-INformed MoE☆264Sep 25, 2024Updated last year
- Efficient Triton Kernels for LLM Training☆6,162Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,699Feb 12, 2026Updated 2 weeks ago
- PyTorch native post-training library☆5,691Updated this week
- Modeling, training, eval, and inference code for OLMo☆6,326Nov 24, 2025Updated 3 months ago
- Things you can do with the token embeddings of an LLM☆1,453Dec 1, 2025Updated 3 months ago
- ☆3,080Nov 21, 2025Updated 3 months ago
- Data preparation code for Amber 7B LLM☆93May 10, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆71,234Updated this week
- Fast and memory-efficient exact attention☆22,361Updated this week
- 🍃 MINT-1T: A one trillion token multimodal interleaved dataset.☆826Jul 31, 2024Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆674Apr 25, 2025Updated 10 months ago
- PyTorch implementation of models from the Zamba2 series.☆187Jan 23, 2025Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆913Dec 18, 2025Updated 2 months ago
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆18,220Nov 3, 2025Updated 3 months ago
- Mamba SSM architecture☆17,257Feb 18, 2026Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,108Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,905Updated this week