UbiquitousLearning / PhoneLM
☆56Updated 5 months ago
Alternatives and similar repositories for PhoneLM:
Users that are interested in PhoneLM are comparing it to the libraries listed below
- ☆90Updated 6 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆146Updated this week
- ☆85Updated 2 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 11 months ago
- FuseAI Project☆85Updated 3 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆115Updated 4 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆82Updated last month
- KV cache compression for high-throughput LLM inference☆126Updated 2 months ago
- ☆214Updated this week
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆162Updated 3 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆263Updated 6 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆123Updated 4 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆208Updated 5 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆86Updated 2 weeks ago
- ☆46Updated 9 months ago
- Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆25Updated last month
- The homepage of OneBit model quantization framework.☆175Updated 2 months ago
- ☆56Updated last week
- PB-LLM: Partially Binarized Large Language Models☆151Updated last year
- Data preparation code for Amber 7B LLM☆88Updated 11 months ago
- ☆75Updated last year
- ☆45Updated 9 months ago
- Awesome Mobile LLMs☆169Updated last month
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 10 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆158Updated 10 months ago
- Experiments on speculative sampling with Llama models☆125Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆161Updated this week
- ☆37Updated 6 months ago