MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,438Apr 30, 2026Updated 3 weeks ago
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,777Apr 2, 2026Updated last month
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ668May 10, 2025Updated last year
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,353Apr 15, 2024Updated 2 years ago
- 4M: Massively Multimodal Masked Modelingβ1,796Jun 2, 2025Updated 11 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,963May 3, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β931Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,625Sep 10, 2025Updated 8 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β124Oct 15, 2025Updated 7 months ago
- Data preparation code for Amber 7B LLMβ95May 10, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,536Jul 17, 2025Updated 10 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,142May 19, 2025Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,825May 15, 2026Updated last week
- On-device AI across mobile, embedded and edge for PyTorchβ4,622Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,690Apr 7, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,096Jul 29, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β370Apr 13, 2026Updated last month
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,229Jul 11, 2024Updated last year
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ4,917Aug 10, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,760Jul 18, 2025Updated 10 months ago
- PyTorch implementation of models from the Zamba2 series.β192Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β7,083May 6, 2026Updated 2 weeks ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,364May 1, 2026Updated 3 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β10,211May 5, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Things you can do with the token embeddings of an LLMβ1,451Dec 1, 2025Updated 5 months ago
- Fast and memory-efficient exact attentionβ23,836Updated this week
- CoreNet: A library for training deep neural networksβ7,002Oct 9, 2025Updated 7 months ago
- Efficient Triton Kernels for LLM Trainingβ6,365Updated this week
- PyTorch native post-training libraryβ5,754May 15, 2026Updated last week
- High-speed Large Language Model Serving for Local Deploymentβ9,469May 11, 2026Updated last week
- Official implementation of Half-Quadratic Quantization (HQQ)β939Feb 26, 2026Updated 2 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ679Apr 25, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ80,418Updated this week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- β3,091Nov 21, 2025Updated 6 months ago
- Modeling, training, eval, and inference code for OLMoβ6,507Nov 24, 2025Updated 5 months ago
- Local realtime voice AIβ2,484Nov 26, 2025Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β395Feb 14, 2025Updated last year
- GRadient-INformed MoEβ264Sep 25, 2024Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,331Updated this week
- llama3 implementation one matrix multiplication at a timeβ15,236May 23, 2024Updated last year