MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,446Apr 30, 2026Updated last month
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,821May 26, 2026Updated 3 weeks ago
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ667May 10, 2025Updated last year
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,358Apr 15, 2024Updated 2 years ago
- 4M: Massively Multimodal Masked Modelingβ1,798Jun 2, 2025Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,987May 3, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β930Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,624Sep 10, 2025Updated 9 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β129Oct 15, 2025Updated 8 months ago
- Data preparation code for Amber 7B LLMβ96May 10, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,567Jul 17, 2025Updated 11 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,141May 19, 2025Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,862Updated this week
- On-device AI across mobile, embedded and edge for PyTorchβ4,733Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,720May 26, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,101Jul 29, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β372Apr 13, 2026Updated 2 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,232Jul 11, 2024Updated last year
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ5,041Aug 10, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,761Jul 18, 2025Updated 11 months ago
- PyTorch implementation of models from the Zamba2 series.β194Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β7,154Jun 13, 2026Updated last week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,425Jun 9, 2026Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β10,434May 16, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Things you can do with the token embeddings of an LLMβ1,451Dec 1, 2025Updated 6 months ago
- Fast and memory-efficient exact attentionβ24,170Updated this week
- CoreNet: A library for training deep neural networksβ6,999Oct 9, 2025Updated 8 months ago
- PyTorch native post-training libraryβ5,774Updated this week
- Efficient Triton Kernels for LLM Trainingβ6,444Updated this week
- High-speed Large Language Model Serving for Local Deploymentβ9,568May 11, 2026Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)β944Feb 26, 2026Updated 3 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ680Apr 25, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ83,135Updated this week
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- β3,090Updated this week
- Modeling, training, eval, and inference code for OLMoβ6,560Nov 24, 2025Updated 6 months ago
- Local realtime voice AIβ2,484Nov 26, 2025Updated 6 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β403Feb 14, 2025Updated last year
- GRadient-INformed MoEβ264Sep 25, 2024Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,368May 19, 2026Updated last month
- llama3 implementation one matrix multiplication at a timeβ15,229May 23, 2024Updated 2 years ago