MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,428Apr 21, 2025Updated last year
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,737Apr 2, 2026Updated 3 weeks ago
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ669May 10, 2025Updated 11 months ago
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,350Apr 15, 2024Updated 2 years ago
- 4M: Massively Multimodal Masked Modelingβ1,794Jun 2, 2025Updated 10 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,954May 3, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β930Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,627Sep 10, 2025Updated 7 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β123Oct 15, 2025Updated 6 months ago
- Data preparation code for Amber 7B LLMβ94May 10, 2024Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,512Jul 17, 2025Updated 9 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,139May 19, 2025Updated 11 months ago
- PyTorch native quantization and sparsity for training and inferenceβ2,807Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,674Apr 7, 2026Updated 3 weeks ago
- On-device AI across mobile, embedded and edge for PyTorchβ4,547Updated this week
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,094Jul 29, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β367Apr 13, 2026Updated 2 weeks ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ4,835Aug 10, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,222Jul 11, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,762Jul 18, 2025Updated 9 months ago
- PyTorch implementation of models from the Zamba2 series.β193Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β7,023Mar 15, 2026Updated last month
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,326Apr 25, 2026Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β10,070Apr 24, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Things you can do with the token embeddings of an LLMβ1,453Dec 1, 2025Updated 5 months ago
- Fast and memory-efficient exact attentionβ23,563Updated this week
- CoreNet: A library for training deep neural networksβ7,008Oct 9, 2025Updated 6 months ago
- PyTorch native post-training libraryβ5,739Apr 24, 2026Updated last week
- High-speed Large Language Model Serving for Local Deploymentβ9,390Jan 24, 2026Updated 3 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β931Feb 26, 2026Updated 2 months ago
- Efficient Triton Kernels for LLM Trainingβ6,315Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ678Apr 25, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ78,385Updated this week
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- β3,091Nov 21, 2025Updated 5 months ago
- Modeling, training, eval, and inference code for OLMoβ6,488Nov 24, 2025Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β390Feb 14, 2025Updated last year
- Local realtime voice AIβ2,483Nov 26, 2025Updated 5 months ago
- GRadient-INformed MoEβ264Sep 25, 2024Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,309Apr 21, 2026Updated last week
- llama3 implementation one matrix multiplication at a timeβ15,243May 23, 2024Updated last year