MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,444Apr 30, 2026Updated last month
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,805May 26, 2026Updated 2 weeks ago
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ668May 10, 2025Updated last year
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,355Apr 15, 2024Updated 2 years ago
- 4M: Massively Multimodal Masked Modelingβ1,798Jun 2, 2025Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,979May 3, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β929Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,623Sep 10, 2025Updated 9 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β127Oct 15, 2025Updated 7 months ago
- Data preparation code for Amber 7B LLMβ95May 10, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,556Jul 17, 2025Updated 10 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,143May 19, 2025Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,847Updated this week
- On-device AI across mobile, embedded and edge for PyTorchβ4,716Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,711May 26, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,097Jul 29, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β371Apr 13, 2026Updated last month
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,230Jul 11, 2024Updated last year
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ4,976Aug 10, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,763Jul 18, 2025Updated 10 months ago
- PyTorch implementation of models from the Zamba2 series.β192Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β7,126May 6, 2026Updated last month
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,414Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β10,351May 16, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Things you can do with the token embeddings of an LLMβ1,451Dec 1, 2025Updated 6 months ago
- Fast and memory-efficient exact attentionβ24,037Jun 3, 2026Updated last week
- CoreNet: A library for training deep neural networksβ6,998Oct 9, 2025Updated 8 months ago
- PyTorch native post-training libraryβ5,768Updated this week
- Efficient Triton Kernels for LLM Trainingβ6,415Updated this week
- High-speed Large Language Model Serving for Local Deploymentβ9,522May 11, 2026Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)β940Feb 26, 2026Updated 3 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ680Apr 25, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ81,909Jun 4, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β3,094Nov 21, 2025Updated 6 months ago
- Modeling, training, eval, and inference code for OLMoβ6,522Nov 24, 2025Updated 6 months ago
- Local realtime voice AIβ2,482Nov 26, 2025Updated 6 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β399Feb 14, 2025Updated last year
- GRadient-INformed MoEβ264Sep 25, 2024Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,343May 19, 2026Updated 3 weeks ago
- llama3 implementation one matrix multiplication at a timeβ15,231May 23, 2024Updated 2 years ago