MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,420Apr 21, 2025Updated 11 months ago
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,705Apr 2, 2026Updated last week
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ668May 10, 2025Updated 11 months ago
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,350Apr 15, 2024Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,933May 3, 2024Updated last year
- 4M: Massively Multimodal Masked Modelingβ1,792Jun 2, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β929Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,620Sep 10, 2025Updated 7 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β123Oct 15, 2025Updated 5 months ago
- Data preparation code for Amber 7B LLMβ94May 10, 2024Updated last year
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,138May 19, 2025Updated 10 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,488Jul 17, 2025Updated 8 months ago
- PyTorch native quantization and sparsity for training and inferenceβ2,769Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,644Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,094Jul 29, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β363Feb 5, 2026Updated 2 months ago
- On-device AI across mobile, embedded and edge for PyTorchβ4,469Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ4,773Aug 10, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,208Jul 11, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,757Jul 18, 2025Updated 8 months ago
- PyTorch implementation of models from the Zamba2 series.β193Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β6,945Mar 15, 2026Updated 3 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β9,962Mar 4, 2026Updated last month
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,280Apr 4, 2026Updated last week
- NordVPN Special Discount Offer β’ AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Things you can do with the token embeddings of an LLMβ1,454Dec 1, 2025Updated 4 months ago
- Fast and memory-efficient exact attentionβ23,185Updated this week
- CoreNet: A library for training deep neural networksβ7,004Oct 9, 2025Updated 6 months ago
- PyTorch native post-training libraryβ5,728Updated this week
- High-speed Large Language Model Serving for Local Deploymentβ9,275Jan 24, 2026Updated 2 months ago
- Efficient Triton Kernels for LLM Trainingβ6,265Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)β925Feb 26, 2026Updated last month
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ676Apr 25, 2025Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ75,637Updated this week
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- β3,085Nov 21, 2025Updated 4 months ago
- Modeling, training, eval, and inference code for OLMoβ6,463Nov 24, 2025Updated 4 months ago
- Local realtime voice AIβ2,477Nov 26, 2025Updated 4 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β383Feb 14, 2025Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,278Apr 1, 2026Updated last week
- llama3 implementation one matrix multiplication at a timeβ15,244May 23, 2024Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,865Jun 10, 2024Updated last year