UbiquitousLearning / PhoneLMLinks
☆65Updated last year
Alternatives and similar repositories for PhoneLM
Users that are interested in PhoneLM are comparing it to the libraries listed below
Sorting:
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆109Updated 8 months ago
- ☆102Updated last year
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆276Updated 3 months ago
- FuseAI Project☆87Updated last year
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆477Updated 2 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆218Updated 8 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated last year
- [ICLR 2026] Efficient Agent Training for Computer Use☆135Updated 4 months ago
- ☆64Updated 8 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆185Updated last year
- KV cache compression for high-throughput LLM inference☆151Updated 11 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆148Updated 3 months ago
- ☆62Updated 6 months ago
- High-speed and easy-use LLM serving framework for local deployment☆145Updated 5 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆176Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆120Updated 8 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆432Updated last week
- llama.cpp tutorial on Android phone☆144Updated 9 months ago
- Awesome Mobile LLMs☆301Updated 2 months ago
- ☆56Updated last year
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆262Updated 8 months ago
- Official implementation for Training LLMs with MXFP4☆118Updated 9 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆282Updated this week
- ☆131Updated 8 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆58Updated 4 months ago
- The homepage of OneBit model quantization framework.☆200Updated 11 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Updated last year
- Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"☆553Updated 2 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆137Updated last year