MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
β1,417Apr 21, 2025Updated 11 months ago
Alternatives and similar repositories for MobileLLM
Users that are interested in MobileLLM are comparing it to the libraries listed below
Sorting:
- Everything about the SmolLM and SmolVLM family of modelsβ3,675Jan 13, 2026Updated 2 months ago
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ669May 10, 2025Updated 10 months ago
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,345Apr 15, 2024Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,922May 3, 2024Updated last year
- 4M: Massively Multimodal Masked Modelingβ1,787Jun 2, 2025Updated 9 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".β929Oct 28, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,624Sep 10, 2025Updated 6 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β119Oct 15, 2025Updated 5 months ago
- Data preparation code for Amber 7B LLMβ93May 10, 2024Updated last year
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β3,135May 19, 2025Updated 10 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,463Jul 17, 2025Updated 8 months ago
- PyTorch native quantization and sparsity for training and inferenceβ2,739Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,617Feb 19, 2026Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β361Feb 5, 2026Updated last month
- On-device AI across mobile, embedded and edge for PyTorchβ4,386Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,087Jul 29, 2024Updated last year
- A framework for serving and evaluating LLM routers - save LLM costs without compromising qualityβ4,698Aug 10, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,201Jul 11, 2024Updated last year
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.β4,757Jul 18, 2025Updated 8 months ago
- PyTorch implementation of models from the Zamba2 series.β189Jan 23, 2025Updated last year
- Tools for merging pretrained large language models.β6,867Mar 15, 2026Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β9,832Mar 4, 2026Updated 2 weeks ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,228Mar 6, 2026Updated 2 weeks ago
- Things you can do with the token embeddings of an LLMβ1,453Dec 1, 2025Updated 3 months ago
- Fast and memory-efficient exact attentionβ22,832Updated this week
- CoreNet: A library for training deep neural networksβ7,009Oct 9, 2025Updated 5 months ago
- High-speed Large Language Model Serving for Local Deploymentβ8,834Jan 24, 2026Updated last month
- PyTorch native post-training libraryβ5,707Updated this week
- Efficient Triton Kernels for LLM Trainingβ6,216Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)β919Feb 26, 2026Updated 3 weeks ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ674Apr 25, 2025Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ73,479Updated this week
- β3,084Nov 21, 2025Updated 4 months ago
- Modeling, training, eval, and inference code for OLMoβ6,404Nov 24, 2025Updated 3 months ago
- Local realtime voice AIβ2,439Nov 26, 2025Updated 3 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β380Feb 14, 2025Updated last year
- GRadient-INformed MoEβ264Sep 25, 2024Updated last year
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,261Mar 3, 2026Updated 2 weeks ago
- llama3 implementation one matrix multiplication at a timeβ15,252May 23, 2024Updated last year